# I translated Peter Shirley's Raytracer to Julia. C++ 1min -> Julia 2m30s -> 15s (@threads)

Hi,

I like raytracing and there are a nice set of books

https://github.com/RayTracing/raytracing.github.io

so now I’m a bit better at Julia, I transcoded the C++ into Julia and got some nice speedups.

The single threaded version went from 9m in C++ to 2m30 in Julia (with no tricks - I don’t know any!)

(The C++ had no compiler flags, after using -O 3 and -march=native it was down to about 1 minute)

And when I added Threads (40 of them )I got it down to 15s, including compilation.

Here’s my julia code

And here’s the Book’s Github Discussion page where I made a similar post to this

21 Likes

Sorry for the noob question, does your Julia code produce images like the one in the “Ray-tracing in one weekend” book?

1 Like

yes, that exact one (modulo the rand()s)

1 Like

Wow, I’m speechless.

1 Like

Yes, I’m quite pleased. I wasn’t doing it to make it faster, I just wanted it to produce results

But then I used a bit of `@code_warntypes` and @time and I reduced the allocations from 10s of millions to 10s of 1000s

FYI, searching through Discourse, there seem to be a few related threads. I’m linking one of them.

4 hours !

Cool! Some small styling things:

``````const Vec3 = SVector{3,Float64}
``````

is the same as `SArray{Tuple{3}, Float64, 1, 3}`.

and note that you don’t need to define `magnitude`, actually you have:

``````using LinearAlgebra: norm
x = rand(Vec3)
norm(x)
``````

and also you could do:

``````julia> randf(::Type{T},fmin,fmax) where T = fmin .+ (fmax-fmin) .* rand(T)
randf (generic function with 1 method)

julia> randf(Vec3,-1,1)
3-element SVector{3, Float64} with indices SOneTo(3):
-0.3009982992786695
-0.15959842094608057
0.27936134566733006

julia> randf(SVector{2,Float64},-1,1)
2-element SVector{2, Float64} with indices SOneTo(2):
-0.8267854719262475
-0.9675131426873447

julia> randf(Float32,-1,1)
0.52987814f0

``````
1 Like

`magnitude(x,y,z)` is quicker than `norm([x,y,z])` because of the extra allocation

`magnitude(v)` and `norm(v)` are the same

`norm` is a strange name too, Shirley had `length` and I don’t like that either

I did swap out `unit_vector` for `LinearAlgebra.normalize`

that randf is cool too, but again, it creates [x,y,z] not x,y,z
and this is all about reducing those allocations

I moved the code, git was being annoying with the fork

https://github.com/lawless-m/ShirleyRenderer.jl

If you use StaticArrays none of these allocate.

it was still slightly slower

perhaps I shall rework it all the way through and see properly

There’s no reason why Julia is so much slower, (not GC, not String, not I/O, not byte wrangling), well I know it’s easy for me to say it… But I hope we could make it within 125% time of C++. This is a micro benchmark big enough that people can point to as “composite” test for suitability blah blah

It uses more samples per pixel (150) and more depth (200), the defaults are 10 & 50

Again with 40 threads. I’m not going to chase more performance for the moment - that’s not the purpose of this - although I hope to experiment with some GPU offloading perhaps. I can’t use any CPU specialisations - Julia complains about a missing cx-16 instruction when I try and use -C

``````matt@pox:~/GitHub/ShirleyRenderer.jl/src\$ julia --project=.. -L RandomScene.jl -e "@time main(;samples_per_pixel=150, max_depth=200)"
123.419846 seconds (328.84 M allocations: 24.327 GiB, 15.28% gc time, 1.76% compilation time)
``````

5 Likes

150% is the best I have managed

matt@pox:~/GitHub/ShirleyRenderer.jl\$ time julia -t 1 --project=. -L src/RandomScene.jl -e “main()”

real 1m13.689s
user 1m12.059s
sys 0m1.360s

matt@pox:~/GitHub/raytracing.github.io/src/\$ time ./a.out > a.out.ppm
Scanlines remaining: 0
Done.

real 0m49.599s
user 0m49.570s
sys 0m0.028s

Your example has 24GB allocations and 15% GC time. Thats probably the problem. You could preallocate all the arrays used for each thread and reuse them. That much allocation will probably reduce your cache performance too.

1 Like

yep, but I think that’s quite a bit of re-work

I’ll spend some more time on it once I have the other code ported

1 Like

Turned out the design needed me to do the work of per thread anyway.

I used 2 muitable structs to hold the intermediate results, this reduced allocations significantly.

e.g.

``````mutable struct Hit
p::Point3
normal::Vec3
t::Float64
front_face::Bool
material::Material
Hit() = new(zero(Point3), zero(Vec3), 0, true)
end
``````

However the effect on the time was a few seconds at the most

``````\$time julia --project=. -t 1 -L src/RandomScene.jl -e "@time main()"
62.763695 seconds (6.64 M allocations: 330.171 MiB, 0.53% gc time, 3.15% compilation time)

real    1m10.922s
user    1m9.625s
sys     0m1.029s
``````

and only 1 second wall for the 40 threaded version

``````matt@pox:~/GitHub/ShirleyRenderer.jl\$ time julia --project=. -t 40 -L src/RandomScene.jl -e "@time main()"
5.969525 seconds (6.64 M allocations: 330.225 MiB, 3.03% gc time, 35.72% compilation time)

real    0m14.056s
user    2m48.924s
sys     0m0.988s
``````

I should take my own advice and try it but are mutable structs significantly slower than if I had a set of these and updated them ?

``````ps = Vector{Point3}(undef, nthreads())
Immutable is usually better, but more importantly, type-stable is best. You need to run Cthulhu.jl and ProfileView.jl on everything and get rid of instabilities. At a glance, `Material` is an abstract type, but you have it as a field type. Instead maybe use a type parameter M, or if you need to mix materials in a vector, dont use a type to specify them at all.