and note that you don’t need to define magnitude, actually you have:
using LinearAlgebra: norm
x = rand(Vec3)
norm(x)
and also you could do:
julia> randf(::Type{T},fmin,fmax) where T = fmin .+ (fmax-fmin) .* rand(T)
randf (generic function with 1 method)
julia> randf(Vec3,-1,1)
3-element SVector{3, Float64} with indices SOneTo(3):
-0.3009982992786695
-0.15959842094608057
0.27936134566733006
julia> randf(SVector{2,Float64},-1,1)
2-element SVector{2, Float64} with indices SOneTo(2):
-0.8267854719262475
-0.9675131426873447
julia> randf(Float32,-1,1)
0.52987814f0
There’s no reason why Julia is so much slower, (not GC, not String, not I/O, not byte wrangling), well I know it’s easy for me to say it… But I hope we could make it within 125% time of C++. This is a micro benchmark big enough that people can point to as “composite” test for suitability blah blah
I added support for Images instead of PPM so now I can upload this
It uses more samples per pixel (150) and more depth (200), the defaults are 10 & 50
Again with 40 threads. I’m not going to chase more performance for the moment - that’s not the purpose of this - although I hope to experiment with some GPU offloading perhaps. I can’t use any CPU specialisations - Julia complains about a missing cx-16 instruction when I try and use -C
matt@pox:~/GitHub/ShirleyRenderer.jl/src$ julia --project=.. -L RandomScene.jl -e "@time main(;samples_per_pixel=150, max_depth=200)"
123.419846 seconds (328.84 M allocations: 24.327 GiB, 15.28% gc time, 1.76% compilation time)
Your example has 24GB allocations and 15% GC time. Thats probably the problem. You could preallocate all the arrays used for each thread and reuse them. That much allocation will probably reduce your cache performance too.
Turned out the design needed me to do the work of per thread anyway.
I used 2 muitable structs to hold the intermediate results, this reduced allocations significantly.
e.g.
mutable struct Hit
p::Point3
normal::Vec3
t::Float64
front_face::Bool
material::Material
Hit() = new(zero(Point3), zero(Vec3), 0, true)
end
However the effect on the time was a few seconds at the most
$time julia --project=. -t 1 -L src/RandomScene.jl -e "@time main()"
62.763695 seconds (6.64 M allocations: 330.171 MiB, 0.53% gc time, 3.15% compilation time)
real 1m10.922s
user 1m9.625s
sys 0m1.029s
and only 1 second wall for the 40 threaded version
matt@pox:~/GitHub/ShirleyRenderer.jl$ time julia --project=. -t 40 -L src/RandomScene.jl -e "@time main()"
5.969525 seconds (6.64 M allocations: 330.225 MiB, 3.03% gc time, 35.72% compilation time)
real 0m14.056s
user 2m48.924s
sys 0m0.988s
I should take my own advice and try it but are mutable structs significantly slower than if I had a set of these and updated them ?
Immutable is usually better, but more importantly, type-stable is best. You need to run Cthulhu.jl and ProfileView.jl on everything and get rid of instabilities. At a glance, Material is an abstract type, but you have it as a field type. Instead maybe use a type parameter M, or if you need to mix materials in a vector, dont use a type to specify them at all.
One thing I noticed is that having 1) Materials as an abstract element of Sphere and 2) a vector of abstract Hitables in Scene, kind of hits a sweet spot in the compiler which produces a well-optimized code.