Why is this raytracer slow?

linwaytin · April 13, 2021, 9:53am

I bumped into this repo.
https://github.com/edin/raytracer
It compares implementations in different languages.
I’m surprised that julia is more than 5 times slower than C.
The code seems idiomatic. Does anyone have some thoughts on this julia code?

sdanisch · April 13, 2021, 10:33am

They seem to count compilation time, which is a pretty big chunk of the computation:

  0.172780 seconds (1.68 M allocations: 103.636 MiB, 4.74% gc time, 39.35% compilation time)

  0.099108 seconds (1.44 M allocations: 89.093 MiB, 9.12% gc time)

Although that’s in VSCode… In the REPL it looks way less significant:

PS C:\Users\sdani\MakieDev> julia .\ray.jl
  0.102447 seconds (1.44 M allocations: 89.091 MiB, 6.47% gc time)
  0.102314 seconds (1.44 M allocations: 89.091 MiB, 3.63% gc time)

It’s interesting, that in VSCode it stays reproducibly at 0.924, while when running it as a script it stays at 0.102…
Will need some more investigation, but pretty sure we should be able to match C!

linwaytin · April 13, 2021, 10:45am

I think you are right. It uses

@time begin
    scene = Scene()
    image = Render(scene)
end

to measure the time, which should include the compilation time.

Edit: I replaced @time with @btime and that reduce the reported time by about 50%.
It is still about 3 times slower than C.

SteffenPL · April 13, 2021, 11:31am

The main work should be done in GetClosestIntersection which is called 500² times.
(Probably explaining why the compilation is not that significant.)

On my computer that takes 6 allocations per execution, which seems fair given the complexity.
I’m wondering if making that GetClosestIntersection allocation free is the trick?

With their vector/linear algebra definitions, they should have something close to StaticArrays so, probably that’s not the main problem?

DNF · April 13, 2021, 12:11pm

Many of the type definitions have abstract fields, like:

struct Ray
    start::Vector
    dir::Vector
end

That’s probably an issue.

Edit: Ooops, I see now that they have overloaded the name Vector for their own purpose, and it might be concrete after all. But stealing the name Vector is pretty bold, I wouldn’t advise it.

Skoffer · April 13, 2021, 12:15pm

It’s not real Vector, they have overwriten them: https://github.com/edin/raytracer/blob/master/julia/RayTracer.jl#L6-L10. Yet I am not sure what that really means from a compiler point of view.

And yet, main issue is abstract fields: https://github.com/edin/raytracer/blob/master/julia/RayTracer.jl#L132-L153

This one should behave very bad. It should either be processed with the help of union splitting or should be concretly typed.

Oh, and also most of the critical functions are type unstable, they return Union{Nothing, <Some type>}. So yes, it should be refactored to gain more performance.

Sukera · April 13, 2021, 12:25pm

I’m fairly certain none of these is abstractly typed…?

AH, you mean the fields of those Planes

Skoffer · April 13, 2021, 12:26pm

Thing is an abstract type: https://github.com/edin/raytracer/blob/master/julia/RayTracer.jl#L62, so things field in Scene is also abstractly typed.

lmiq · April 13, 2021, 1:45pm

Here the C and Fortran codes run in ~80-110ms and the Julia code (with @btime) runs in 142ms. It is not as bad as they report.

Solving that instability pointed by @Skoffer (by defining const Thing = Union{Plane,Sphere}, does not improve the speed, but allocations go from 1.4k to 8.

The code can be much simpler if one defines Vector <: FieldVector{3,Float64} from StaticArrays (no need to define all the methods for operations of this type then, one can use the operations of LinearAlgebra). But that does not improve speed much (raw Julia code is good )

I think that they opted to always return nothing or a valid object in every routine and that is worst than what is implemented in Fortran, for example, where the objects have an additional valid field. By doing that, the functions would return always the same object, and probably the specialization could be better. Does this make sense?

linwaytin · April 13, 2021, 4:44pm

Yes, the C version on my laptop also gives about 80ms, so the julia version is actually good enough.
By the way, the nim version gives about 65ms, which is quite amazing to me, though I think that’s a highly specialized version.
We possibly can do the same thing on julia.

lmiq · April 13, 2021, 6:37pm

The code of the nim version is quite clear (and I had never seen a nim code before). Quite impressive, I must say.

But the trick to make it faster is explicit:

func traceRay(scene: var Scene, start, dir: Vector, depth: int): Color{.noinit.} =
  ## ok, the ptr thing might look sketchy, and makes all of these procs
  ## need to take a var Scene when they don't mutate it, but it's
  ## a highly performance sensitive proc and other solutions are
  ## slower or messier or both
  var
    closest = INF
    closestThing: ptr Thing
  for thing in scene.things.mitems:
    intersections(thing, start, dir, dist < closest):
      closest = dist
      closestThing = thing.addr
  result = if closestThing != nil:
    scene.shade(closestThing[], start, dir, closest, depth)
  else:
    black

linwaytin · April 13, 2021, 6:46pm

So it is some low-level trick that makes the nim version so quick?
As far as I know, ptr is not idiomatic in nim and using it is not encouraged.

sdanisch · April 14, 2021, 11:41am

I just noticed, that nim uses a custom pow and quite a few compiler options (e.g. d:danger ).
Adding the same pow + some @inlines brings the time from 0.1 seconds to 0.067 seconds for me.

linwaytin · April 14, 2021, 11:50am

Interesting, could you please share your code?

foobar_lv2 · April 14, 2021, 1:19pm

Just looking over the code. Super nice project for language comparisons, especially because the implementations are so similar (instead of attempting more “idiomatic” approaches)!

Only thing that is instantly suspect from source (without running / profiling) is the following type instability:

github.com

edin/raytracer/blob/f3aaac46897698dd93e2f2bec160ce112bbfd0c0/julia/RayTracer.jl#L173


      
          Scale(v::Vector, k::Float64) =  Vector(k * v.x, k * v.y, k * v.z)
          Dot(a::Vector, b::Vector) =  a.x * b.x + a.y * b.y + a.z * b.z
          
          
Base.:*(v::Vector, k::Float64) = Scale(v, k)
          Base.:*(a::Vector, b::Vector) = Dot(a, b)
          Base.:+(a::Vector, b::Vector) = Vector(a.x + b.x, a.y + b.y, a.z + b.z)
          Base.:-(a::Vector, b::Vector) = Vector(a.x - b.x, a.y - b.y, a.z - b.z)
          
          
function Norm(v::Vector)
              len = Length(v)
              div = (len == 0) ? floatmax(Float32) : 1.0 / len
              Scale(v, div)
          end
          
          
function Cross(a::Vector, b::Vector)
              return Vector(
                  a.y * b.z - a.z * b.y,
                  a.z * b.x - a.x * b.z,
                  a.x * b.y - a.y * b.x
              )
          end

edit: Other thing is the main issue that @lmiq commented on, i.e. that Scene is build from an array with open ended abstract typed elements. More idiomatic in julia would be to either use a Union, or better use multiple homogeneous arrays. The latter might however defy the spirit of this exercise, which is homogeneity across language implementations.

github.com

edin/raytracer/blob/f3aaac46897698dd93e2f2bec160ce112bbfd0c0/julia/RayTracer.jl#L282


      
                  write(f, c.r)
                  write(f, c.a)
              end
              close(f)
          end
          
          
function GetClosestIntersection(scene::Scene, ray::Ray)::Union{Intersection,Nothing}
              closest = floatmax(Float64)
              closestInter = nothing
          
          
    for thing in scene.things
                  inter = GetInteresection(thing, ray)
                  if (inter != nothing && inter.dist < closest) 
                      closestInter = inter
                      closest = inter.dist
                  end
              end
              return closestInter;
          end
          
          
function TraceRay(scene::Scene, ray::Ray, depth::Int)

Oscar_Smith · April 14, 2021, 1:22pm

That’s really bad because it means everything is running with Float64 when it appears to be using Float32.

linwaytin · April 14, 2021, 1:35pm

This is a good point. I changed the Float32 to Float64 and got improvement by about 10%.
I don’t know why they use Float32 there. All other floats are in Float64.

Oscar_Smith · April 14, 2021, 1:36pm

Oh. I actually meant going the other direction and making everything a Float32. That will likely be faster than Float64 if you are creating a lot of Vectors since it will lower cache usage.

foobar_lv2 · April 14, 2021, 1:42pm

Yeah, if performance was the point of this, Float32 would be the way to go. However, the C implementation uses double, and julia always wants to turn everything into Float64 anyways (even though Float32 is more appropriate for many applications, one needs to be hellishly cautious about type instabilities).

lmiq · April 14, 2021, 3:09pm

Juila supposedly already specializes to a faster pow if the exponent is an integer:

If y is an Int literal (e.g. 2 in x^2 or -3 in x^-3), the Julia code x^y is transformed by the
  compiler to Base.literal_pow(^, x, Val(y))

although that does not appear to be happening if one actually follows the code for larger integers than 3:

@inline function ^(x::Float64, y::Integer)
    y == -1 && return inv(x)
    y == 0 && return one(x)
    y == 1 && return x
    y == 2 && return x*x
    y == 3 && return x*x*x
    ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, Float64(y))
end

Anyway, if I overload Base.^ with:

import Base.^
^(x::Real,y::Integer) = Base.literal_pow(^,x,Val(y))

I don’t see any clear difference in performance there.

Topic		Replies	Views
Ray Tracing in a week-end - Julia vs SIMD-optimized C++ Performance gpu , graphics , benchmark , pluto , staticarrays	81	14091	August 9, 2024
I translated Peter Shirley's Raytracer to Julia. C++ 1min -> Julia 2m30s -> 15s (@threads) Visualization images , graphics	27	3196	February 21, 2022
How to speed up LoopVectorized code - 3x slower than C++ code Performance	19	1317	April 27, 2023
Disabling allocations Performance	50	5599	December 10, 2020
Understanding Julia performance in simple finite difference code New to Julia	38	4449	June 7, 2018

Why is this raytracer slow?

Related topics