Why is this raytracer slow?

I bumped into this repo.
https://github.com/edin/raytracer
It compares implementations in different languages.
I’m surprised that julia is more than 5 times slower than C.
The code seems idiomatic. Does anyone have some thoughts on this julia code?

They seem to count compilation time, which is a pretty big chunk of the computation:

  0.172780 seconds (1.68 M allocations: 103.636 MiB, 4.74% gc time, 39.35% compilation time)

  0.099108 seconds (1.44 M allocations: 89.093 MiB, 9.12% gc time)

Although that’s in VSCode… In the REPL it looks way less significant:

PS C:\Users\sdani\MakieDev> julia .\ray.jl
  0.102447 seconds (1.44 M allocations: 89.091 MiB, 6.47% gc time)
  0.102314 seconds (1.44 M allocations: 89.091 MiB, 3.63% gc time)

It’s interesting, that in VSCode it stays reproducibly at 0.924, while when running it as a script it stays at 0.102…
Will need some more investigation, but pretty sure we should be able to match C!

I think you are right. It uses

@time begin
    scene = Scene()
    image = Render(scene)
end

to measure the time, which should include the compilation time.

Edit: I replaced @time with @btime and that reduce the reported time by about 50%.
It is still about 3 times slower than C.

1 Like

The main work should be done in GetClosestIntersection which is called 500² times.
(Probably explaining why the compilation is not that significant.)

On my computer that takes 6 allocations per execution, which seems fair given the complexity.
I’m wondering if making that GetClosestIntersection allocation free is the trick?

With their vector/linear algebra definitions, they should have something close to StaticArrays so, probably that’s not the main problem?

Many of the type definitions have abstract fields, like:

struct Ray
    start::Vector
    dir::Vector
end

That’s probably an issue.

Edit: Ooops, I see now that they have overloaded the name Vector for their own purpose, and it might be concrete after all. But stealing the name Vector is pretty bold, I wouldn’t advise it.

3 Likes

It’s not real Vector, they have overwriten them: https://github.com/edin/raytracer/blob/master/julia/RayTracer.jl#L6-L10. Yet I am not sure what that really means from a compiler point of view.

And yet, main issue is abstract fields: https://github.com/edin/raytracer/blob/master/julia/RayTracer.jl#L132-L153

This one should behave very bad. It should either be processed with the help of union splitting or should be concretly typed.

Oh, and also most of the critical functions are type unstable, they return Union{Nothing, <Some type>}. So yes, it should be refactored to gain more performance.

I’m fairly certain none of these is abstractly typed…?

AH, you mean the fields of those Planes

Thing is an abstract type: https://github.com/edin/raytracer/blob/master/julia/RayTracer.jl#L62, so things field in Scene is also abstractly typed.

1 Like

Here the C and Fortran codes run in ~80-110ms and the Julia code (with @btime) runs in 142ms. It is not as bad as they report.

Solving that instability pointed by @Skoffer (by defining const Thing = Union{Plane,Sphere}, does not improve the speed, but allocations go from 1.4k to 8.

The code can be much simpler if one defines Vector <: FieldVector{3,Float64} from StaticArrays (no need to define all the methods for operations of this type then, one can use the operations of LinearAlgebra). But that does not improve speed much (raw Julia code is good :slight_smile: )

I think that they opted to always return nothing or a valid object in every routine and that is worst than what is implemented in Fortran, for example, where the objects have an additional valid field. By doing that, the functions would return always the same object, and probably the specialization could be better. Does this make sense?

4 Likes

Yes, the C version on my laptop also gives about 80ms, so the julia version is actually good enough.
By the way, the nim version gives about 65ms, which is quite amazing to me, though I think that’s a highly specialized version.
We possibly can do the same thing on julia.

The code of the nim version is quite clear (and I had never seen a nim code before). Quite impressive, I must say.

But the trick to make it faster is explicit:

func traceRay(scene: var Scene, start, dir: Vector, depth: int): Color{.noinit.} =
  ## ok, the ptr thing might look sketchy, and makes all of these procs
  ## need to take a var Scene when they don't mutate it, but it's
  ## a highly performance sensitive proc and other solutions are
  ## slower or messier or both
  var
    closest = INF
    closestThing: ptr Thing
  for thing in scene.things.mitems:
    intersections(thing, start, dir, dist < closest):
      closest = dist
      closestThing = thing.addr
  result = if closestThing != nil:
    scene.shade(closestThing[], start, dir, closest, depth)
  else:
    black

So it is some low-level trick that makes the nim version so quick?
As far as I know, ptr is not idiomatic in nim and using it is not encouraged.

I just noticed, that nim uses a custom pow and quite a few compiler options (e.g. d:danger :smiley: ).
Adding the same pow + some @inlines brings the time from 0.1 seconds to 0.067 seconds for me.

1 Like

Interesting, could you please share your code?

Just looking over the code. Super nice project for language comparisons, especially because the implementations are so similar (instead of attempting more “idiomatic” approaches)!

Only thing that is instantly suspect from source (without running / profiling) is the following type instability:

edit: Other thing is the main issue that @lmiq commented on, i.e. that Scene is build from an array with open ended abstract typed elements. More idiomatic in julia would be to either use a Union, or better use multiple homogeneous arrays. The latter might however defy the spirit of this exercise, which is homogeneity across language implementations.

2 Likes

That’s really bad because it means everything is running with Float64 when it appears to be using Float32.

This is a good point. I changed the Float32 to Float64 and got improvement by about 10%.
I don’t know why they use Float32 there. All other floats are in Float64.

Oh. I actually meant going the other direction and making everything a Float32. That will likely be faster than Float64 if you are creating a lot of Vectors since it will lower cache usage.

Yeah, if performance was the point of this, Float32 would be the way to go. However, the C implementation uses double, and julia always wants to turn everything into Float64 anyways (even though Float32 is more appropriate for many applications, one needs to be hellishly cautious about type instabilities).

2 Likes

Juila supposedly already specializes to a faster pow if the exponent is an integer:

If y is an Int literal (e.g. 2 in x^2 or -3 in x^-3), the Julia code x^y is transformed by the
  compiler to Base.literal_pow(^, x, Val(y))

although that does not appear to be happening if one actually follows the code for larger integers than 3:

@inline function ^(x::Float64, y::Integer)
    y == -1 && return inv(x)
    y == 0 && return one(x)
    y == 1 && return x
    y == 2 && return x*x
    y == 3 && return x*x*x
    ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, Float64(y))
end

Anyway, if I overload Base.^ with:

import Base.^
^(x::Real,y::Integer) = Base.literal_pow(^,x,Val(y))

I don’t see any clear difference in performance there.