I translated Peter Shirley's Raytracer to Julia. C++ 1min -> Julia 2m30s -> 15s (@threads)

I tried what I thought was a simpler solution, that is creating a Vector of Materials and using the Int64 index instead of the material itself.

Otherwise there is going to be some added complextity with all the Hitables and Materials that come along in the next stages of development.

https://github.com/lawless-m/ShirleyRenderer.jl/tree/OneIndexedMaterials

This helped in that Spheres did not need a Material type but instead an Int

e.g. add!(scene, Sphere(Point3(0, 1, 0), 1.0, add!(scene, Dielectric(1.5))))

but in the end, the scatter still needed to go from Abstract → Concrete

function scatter!(scene, ray::Ray, rec::Hit)::Tuple{Bool, Color}
	_scatter!(scene.materials[rec.material], ray, rec)
end

the @code_warntype see %4

MethodInstance for ShirleyRayTracer.scatter!(::Scene, ::ShirleyRayTracer.Ray, ::Hit)
  from scatter!(scene, ray::ShirleyRayTracer.Ray, rec::Hit) in ShirleyRayTracer at /home/matt/GitHub/ShirleyRenderer.jl/src/Materials.jl:4
Arguments
  #self#::Core.Const(ShirleyRayTracer.scatter!)
  scene::Scene
  ray::ShirleyRayTracer.Ray
  rec::Hit
Body::Tuple{Bool, ColorTypes.RGB{Float64}}
1 ─ %1 = Core.apply_type(ShirleyRayTracer.Tuple, ShirleyRayTracer.Bool, ShirleyRayTracer.Color)::Core.Const(Tuple{Bool, ColorTypes.RGB{Float64}})
│   %2 = Base.getproperty(scene, :materials)::Vector{ShirleyRayTracer.Material}
│   %3 = Base.getproperty(rec, :material)::Int64
│   %4 = Base.getindex(%2, %3)::ShirleyRayTracer.Material
│   %5 = ShirleyRayTracer._scatter!(%4, ray, rec)::Tuple{Bool, ColorTypes.RGB{Float64}}
│   %6 = Base.convert(%1, %5)::Tuple{Bool, ColorTypes.RGB{Float64}}
│   %7 = Core.typeassert(%6, %1)::Tuple{Bool, ColorTypes.RGB{Float64}}
└──      return %7

tbh I’m not expert enough in doing this - as you might tell :slight_smile:

but hey, I’m learning

One thing you can do if you can’t avoid type unstable containers is to hardcode all possible dispatches in if else statements, thereby avoiding dynamic dispatch. The only problem is that you have to know all types for which this should work and hardcode them. There is however the option to use a macro that duplicates one expression for all types found by doing subtypes(abstract_type). In this simple example I get a speed increase of 11.5 and reduce allocations to 0 with this technique (5 subtypes is too much for Julia to generate fast paths on its own):

abstract type T end

struct A <: T
    x::Float64
end

struct B <: T
    x::Float32
end

struct C <: T
    x::Float16
end

struct D <: T
    x::Int64
end

struct E <: T
    x::Int32
end

function test1(vec)
    x = 0.0
    for v in vec
        x += v.x
    end
    x
end

function test2(vec)
    x = 0.0
    for v in vec
        if v isa A
            x += v.x
        elseif v isa B
            x += v.x
        elseif v isa C
            x += v.x
        elseif v isa D
            x += v.x
        elseif v isa E
            x += v.x
        end
    end
    x
end

vec = [rand([A, B, C, D, E])(rand(1:100)) for _ in 1:1000000]
using BenchmarkTools

julia> @btime test1($vec)
  102.284 ms (1600092 allocations: 24.42 MiB)
5.0509599e7

julia> @btime test2($vec)
  8.885 ms (0 allocations: 0 bytes)
5.0509599e7
2 Likes

I did start the work for that but the problem is the various Material subtypes have different properties and it was getting all a bit much.

OTOH, I just counted and there are only 5 different Material types, even by the end of book 3, and they only have 5 different properties between them.

They wrap around the 4 Texture types.

I think it might even make sense to move away from the Object Hierarchy altogether - this is not C++ !

So I changed it to 1 Material and an enum

So it’s now type stable on the Materials

It has taken about 5s off the single threaded time - that’s TTFX

If we render a hot frame, it’s just ~10s slower than the C++ version

With multiple threads, 3.7s per frame

That’s on this branch

https://github.com/lawless-m/ShirleyRenderer.jl/tree/OneWeekend

3 Likes

I made a Distributed version

I haven’t used add_procs yet though, so this is just on my local machine

Each process generates a scanline (tbh it’s scancol)

One interesting aspect is I made it so the workers can stay active. You send them a Render struct, which contains the Scene to render, the image size etc, so it is the beginning of a render farm.

I used to be a professional Digital Animator in the 1990s and I could only dream of such render times on our Pentium Pro 180Mhz machine ! It took nearly 3 weeks to render this, 24/7

matt@pox:~/GitHub/ShirleyRenderer.jl$ time julia -t 8 -p 8 examples/DRandomScene.jl 
      From worker 2:	Listening
      From worker 5:	Listening
      From worker 6:	Listening
      From worker 3:	Listening
      From worker 7:	Listening
      From worker 4:	Listening
      From worker 9:	Listening
      From worker 8:	Listening
  7.492711 seconds (6.69 M allocations: 273.346 MiB, 1.04% gc time, 22.40% compilation time)

real	0m37.941s
user	5m4.618s
sys	0m9.297s
3 Likes

Hi Matt, I see that you switched from StaticArrays Vec3 to a custom one. What was the rationale?

in this thread some people had found reduced TTFX by removing that as a dependency. So I was experimenting with that. I don’t think it made any difference.

But it did enable me to simplify because I could add a methods for op(::Vec3, ::Float64) which broadcasts the scalar op to every member of the vec. although that’s for looks not speed :slight_smile:

I’ve also been playing with GPU code for it, and although I haven’t got it working yet, it does look promising - testing the hits for the whole scanline in a single GPU Kernel

1 Like

Thanks. See also my repo for an example of using CUDA (through Tullio)

1 Like