The GPU is so parallel that you're actually doing *more* calculations, but it's still faster due to the 1000s of parallel units


Like the optimal way to draw a circle on a GPU appears to be testing x^2+y^2<=r^2 for every pixel. On a CPU you would economise

