I bumped into this repo.
https://github.com/edin/raytracer
It compares implementations in different languages.
I’m surprised that julia is more than 5 times slower than C.
The code seems idiomatic. Does anyone have some thoughts on this julia code?
They seem to count compilation time, which is a pretty big chunk of the computation:
0.172780 seconds (1.68 M allocations: 103.636 MiB, 4.74% gc time, 39.35% compilation time)
0.099108 seconds (1.44 M allocations: 89.093 MiB, 9.12% gc time)
Although that’s in VSCode… In the REPL it looks way less significant:
PS C:\Users\sdani\MakieDev> julia .\ray.jl
0.102447 seconds (1.44 M allocations: 89.091 MiB, 6.47% gc time)
0.102314 seconds (1.44 M allocations: 89.091 MiB, 3.63% gc time)
It’s interesting, that in VSCode it stays reproducibly at 0.924, while when running it as a script it stays at 0.102…
Will need some more investigation, but pretty sure we should be able to match C!
I think you are right. It uses
@time begin
scene = Scene()
image = Render(scene)
end
to measure the time, which should include the compilation time.
Edit: I replaced @time
with @btime
and that reduce the reported time by about 50%.
It is still about 3 times slower than C.
The main work should be done in GetClosestIntersection
which is called 500² times.
(Probably explaining why the compilation is not that significant.)
On my computer that takes 6 allocations per execution, which seems fair given the complexity.
I’m wondering if making that GetClosestIntersection
allocation free is the trick?
With their vector/linear algebra definitions, they should have something close to StaticArrays
so, probably that’s not the main problem?
Many of the type definitions have abstract fields, like:
struct Ray
start::Vector
dir::Vector
end
That’s probably an issue.
Edit: Ooops, I see now that they have overloaded the name Vector
for their own purpose, and it might be concrete after all. But stealing the name Vector
is pretty bold, I wouldn’t advise it.
It’s not real Vector
, they have overwriten them: https://github.com/edin/raytracer/blob/master/julia/RayTracer.jl#L6-L10. Yet I am not sure what that really means from a compiler point of view.
And yet, main issue is abstract fields: https://github.com/edin/raytracer/blob/master/julia/RayTracer.jl#L132-L153
This one should behave very bad. It should either be processed with the help of union splitting or should be concretly typed.
Oh, and also most of the critical functions are type unstable, they return Union{Nothing, <Some type>}
. So yes, it should be refactored to gain more performance.
I’m fairly certain none of these is abstractly typed…?
AH, you mean the fields of those Plane
s
Thing
is an abstract type: https://github.com/edin/raytracer/blob/master/julia/RayTracer.jl#L62, so things
field in Scene
is also abstractly typed.
Here the C and Fortran codes run in ~80-110ms and the Julia code (with @btime) runs in 142ms. It is not as bad as they report.
Solving that instability pointed by @Skoffer (by defining const Thing = Union{Plane,Sphere}
, does not improve the speed, but allocations go from 1.4k to 8.
The code can be much simpler if one defines Vector <: FieldVector{3,Float64}
from StaticArrays
(no need to define all the methods for operations of this type then, one can use the operations of LinearAlgebra). But that does not improve speed much (raw Julia code is good )
I think that they opted to always return nothing
or a valid object in every routine and that is worst than what is implemented in Fortran, for example, where the objects have an additional valid
field. By doing that, the functions would return always the same object, and probably the specialization could be better. Does this make sense?
Yes, the C version on my laptop also gives about 80ms, so the julia version is actually good enough.
By the way, the nim version gives about 65ms, which is quite amazing to me, though I think that’s a highly specialized version.
We possibly can do the same thing on julia.
The code of the nim version is quite clear (and I had never seen a nim code before). Quite impressive, I must say.
But the trick to make it faster is explicit:
func traceRay(scene: var Scene, start, dir: Vector, depth: int): Color{.noinit.} =
## ok, the ptr thing might look sketchy, and makes all of these procs
## need to take a var Scene when they don't mutate it, but it's
## a highly performance sensitive proc and other solutions are
## slower or messier or both
var
closest = INF
closestThing: ptr Thing
for thing in scene.things.mitems:
intersections(thing, start, dir, dist < closest):
closest = dist
closestThing = thing.addr
result = if closestThing != nil:
scene.shade(closestThing[], start, dir, closest, depth)
else:
black
So it is some low-level trick that makes the nim version so quick?
As far as I know, ptr
is not idiomatic in nim and using it is not encouraged.
I just noticed, that nim uses a custom pow and quite a few compiler options (e.g. d:danger
).
Adding the same pow + some @inlines
brings the time from 0.1 seconds to 0.067 seconds for me.
Interesting, could you please share your code?
Just looking over the code. Super nice project for language comparisons, especially because the implementations are so similar (instead of attempting more “idiomatic” approaches)!
Only thing that is instantly suspect from source (without running / profiling) is the following type instability:
edit: Other thing is the main issue that @lmiq commented on, i.e. that Scene
is build from an array with open ended abstract typed elements. More idiomatic in julia would be to either use a Union, or better use multiple homogeneous arrays. The latter might however defy the spirit of this exercise, which is homogeneity across language implementations.
That’s really bad because it means everything is running with Float64
when it appears to be using Float32
.
This is a good point. I changed the Float32
to Float64
and got improvement by about 10%.
I don’t know why they use Float32
there. All other floats are in Float64
.
Oh. I actually meant going the other direction and making everything a Float32
. That will likely be faster than Float64
if you are creating a lot of Vector
s since it will lower cache usage.
Yeah, if performance was the point of this, Float32 would be the way to go. However, the C implementation uses double
, and julia always wants to turn everything into Float64
anyways (even though Float32 is more appropriate for many applications, one needs to be hellishly cautious about type instabilities).
Juila supposedly already specializes to a faster pow
if the exponent is an integer:
If y is an Int literal (e.g. 2 in x^2 or -3 in x^-3), the Julia code x^y is transformed by the
compiler to Base.literal_pow(^, x, Val(y))
although that does not appear to be happening if one actually follows the code for larger integers than 3:
@inline function ^(x::Float64, y::Integer)
y == -1 && return inv(x)
y == 0 && return one(x)
y == 1 && return x
y == 2 && return x*x
y == 3 && return x*x*x
ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, Float64(y))
end
Anyway, if I overload Base.^
with:
import Base.^
^(x::Real,y::Integer) = Base.literal_pow(^,x,Val(y))
I don’t see any clear difference in performance there.