Some old RT algorithms switched the two nested loops: the outer loop iterates over each triangle in the scene, and the inner one tests for intersections with screen pixels.
The advantage is that you only need to consider the pixels within the projection of the triangle.
You can also completely cull triangles completely occluded by other triangles in front of them.
I don't know whether this improved algorithm is actually a win for scenes with millions of triangles.