As far as we can tell, the allocation and speed different (order of magnitude) purely comes from the fact that one file contains more geometry types during run time than the other file.
but I will summarize finding and steps to reproduce:
git clone https://github.com/peremato/Geom4hep/
cd Geom4hep/
open examples/XRay.jl and comment out everything below line 54 (only keep definitions)
the various distanceToOut and distanceToIn which takes different types, and often call these function with different types (because geometry can nest within each other), is what we think the problem is at
and seems to be a fundamental limitation of Julia types at this point
Dynamic dispatch is slow in Julia ā slower than C++ virtual methods, because C++ method dispatch only depends on a single object type (this) and hence can use vtable lookup.
High-performance code in Julia always relies on devirtualizing critical code. This also means that you canāt easily write performant geometry code in Julia by having an array of geometric objects of different types and relying on dynamic dispatch to execute different methods (determined at runtime) for different objects ā you need to implement a different dispatch strategy.
Itās good to have this confirmed, but while the topic was āwhen # of types increasesā, Iām curious, 1) is it for sure slower when thereās actually only one type involved. And 2) when 2+ for one or more extra type, do you think it gets rapidly slower? [Iām not too concerned about that vs C++, since multiple dispatch is more powerful than C++, and Iām not sure C++ has anything to emulate it faster (or slower or at all), though Stroustrup had a, not yet implemented, proposal.]
Iām guessing Julia is optimized for the multiple case, not single, so Julia doesnāt use a vtable even if it could sometimes? So whatās the alternative in the single case? I think basically a big old (C-like) switch (and @goto only as a macro, maybe vtable could be built?).
Except Julia doesnāt have switch (yet) either, but a big if-else should be as fast:
Julia neither has pattern matching built in (like Scala, and recently added to Python and Java), but itās available with e.g. this package:
Do you know if that is the main package, or at least if it (or any of the many alternatives) are as fast as Cās switch?
A sequence of if elses or using short circuit evaluation in place of a switch(I like this a lot) will lower to efficient structures (jump tables etcā¦) if LLVM thinks it will be faster.
it gets rapidly slower when reaching 5, see the C++ Union splitting thread linked above. other wise comparable in performance because static dispatch has no overhead at run time
I have hoped that I can get away with something like:
function distanceToIn(shape, point, dir)
if shape isa Trap
distanceToIn_trap(shape, point, dir)
elseif shape isa Trd
distanceToIn_trd(shape, point, dir)
elseif shape isa Cone
distanceToIn_cone(shape, point, dir)
elseif shape isa Box
distanceToIn_box(shape, point, dir)
elseif shape isa Tube
distanceToIn_tube(shape, point, dir)
elseif shape isa Volume
distanceToIn_volume(shape, point, dir)
elseif shape isa Polycone
distanceToIn_polycone(shape, point, dir)
elseif shape isa CutTube
distanceToIn_cuttube(shape, point, dir)
elseif shape isa Boolean
distanceToIn_boolean(shape, point, dir)
end
end
but it made no difference. The example you linked seems to suggest this has to go inside the loop directly?
maybe Iām not finding the correct function that causes the issueā¦ but again, Profiler is not helpful here so it has been really hard to pin down the problem
by annotating the individual function with types, Iām able to recover the original performance (i.e. much slower than C++) and I feel like essentially I have re-created how Julia does things but manually and with no performance improvement
Have you tried out using GitHub - YingboMa/Unityper.jl ? It has some annoying ergonomics like requiring a default keword argument field for every struct, but it is designed to solve this exact performance problem youāre having.
the function we call inside the loop is recursive and branch to different types, this is different from most manual union splitting example (and YingboMaās example)