Allocation and slow down when # of types involved increase (slower than C++ virtual methods)

jling · September 22, 2022, 3:43pm

*Happy to pay someone to spend time investigating this

The context is @peremato trying to present a Julia case for detector simulation used at e.g. CERN (the LHC) and other accelerator/medical research places: https://indico.cern.ch/event/1202075/contributions/5056320/attachments/2511955/4317921/Geom4hep-202206.pdf

As far as we can tell, the allocation and speed different (order of magnitude) purely comes from the fact that one file contains more geometry types during run time than the other file.

but I will summarize finding and steps to reproduce:

git clone https://github.com/peremato/Geom4hep/
cd Geom4hep/

open examples/XRay.jl and comment out everything below line 54 (only keep definitions)

julia --project=.
] instantiate
julia> using Revise, Geom4hep
julia> includet("./examples/XRay.jl")
compare these two:

const full = processGDML("examples/trackML.gdml", Float64)
const volume = full[2,1]
const world = getWorld(volume)
const nav = BVHNavigator(world)
@time generateXRay(nav, world, 1e4, 1);
@time generateXRay(nav, world, 1e4, 1);
  0.013423 seconds (9 allocations: 80.453 KiB)

const full2 = processGDML("examples/cms2018.gdml", Float64)
const volume2 = full2[1,7,1]
const world2 = getWorld(volume2)
const nav2 = BVHNavigator(world2)
@time generateXRay(nav2, world2, 1e4, 1);
@time generateXRay(nav2, world2, 1e4, 1);
  2.723019 seconds (29.79 M allocations: 757.602 MiB, 2.67% gc time)

The other finding and bread crumbs can be found: https://github.com/peremato/Geom4hep/issues/1#issuecomment-1228836714

jling · September 22, 2022, 3:48pm

the various distanceToOut and distanceToIn which takes different types, and often call these function with different types (because geometry can nest within each other), is what we think the problem is at

and seems to be a fundamental limitation of Julia types at this point

stevengj · September 22, 2022, 4:02pm

Dynamic dispatch is slow in Julia — slower than C++ virtual methods, because C++ method dispatch only depends on a single object type (this) and hence can use vtable lookup.

High-performance code in Julia always relies on devirtualizing critical code. This also means that you can’t easily write performant geometry code in Julia by having an array of geometric objects of different types and relying on dynamic dispatch to execute different methods (determined at runtime) for different objects — you need to implement a different dispatch strategy.

See also this discussion: Union splitting vs C++

jling · September 22, 2022, 4:04pm

ah yes, and C++ single dispatch is much easier also makes sense.

The question is what are some known solution that doesn’t break code organization and extensibility, unrolling is probably not an option here.

stevengj · September 22, 2022, 4:06pm

I think you basically need to implement your own vtable as discussed in the other thread.

Raf · September 22, 2022, 4:10pm

I havent tried it but maybe this will help?

jling · September 22, 2022, 4:18pm

github.com/thautwarm/Virtual.jl

error: `setfield!` should not be changed

opened 03:44PM - 20 Aug 22 UTC

closed 10:01PM - 25 Sep 22 UTC

Moelf

```julia [ Info: Precompiling Geom4hep [eb5d0804-93e0-431a-a4d4-b4f95b95575a] …ERROR: LoadError: setfield! fields of Types should not be changed Stacktrace: [1] error(s::String) @ Base ./error.jl:35 [2] setproperty!(x::Type, f::Symbol, v::Symbol) @ Base ./Base.jl:34 [3] parse_function_header!(ln::LineNumberNode, self::Virtual.FuncInfo, header::Expr; is_lambda::Bool, allow_lambda::Bool) @ Virtual C:\Users\twshe\Desktop\Git\Virtual\src\reflection.jl:230 [4] parse_function(ln::LineNumberNode, ex::Expr; fallback::Virtual.Undefined, allow_short_func::Bool, allow_lambda::Bool) @ Virtual C:\Users\twshe\Desktop\Git\Virtual\src\reflection.jl:186 [5] var"@override"(__source__::LineNumberNode, __module__::Module, func_def::Any) @ Virtual ~/.julia/packages/Virtual/OLENd/src/Virtual.jl:256 [6] include(mod::Module, _path::String) @ Base ./Base.jl:419 [7] include(x::String) @ Geom4hep ~/Documents/github/Geom4hep/src/Geom4hep.jl:1 ``` is this problem of how I'm using the package or Virtual.jl doesn't work on 1.8?

Palli · September 22, 2022, 4:47pm

It’s good to have this confirmed, but while the topic was “when # of types increases”, I’m curious, 1) is it for sure slower when there’s actually only one type involved. And 2) when 2+ for one or more extra type, do you think it gets rapidly slower? [I’m not too concerned about that vs C++, since multiple dispatch is more powerful than C++, and I’m not sure C++ has anything to emulate it faster (or slower or at all), though Stroustrup had a, not yet implemented, proposal.]

I’m guessing Julia is optimized for the multiple case, not single, so Julia doesn’t use a vtable even if it could sometimes? So what’s the alternative in the single case? I think basically a big old (C-like) switch (and @goto only as a macro, maybe vtable could be built?).

Except Julia doesn’t have switch (yet) either, but a big if-else should be as fast:

github.com/JuliaLang/julia

A case/switch statement for Julia

opened 03:11AM - 30 Aug 16 UTC

bpr

speculative julep equality

# Proposal This is a Julep suggested by @StefanKarpinski during this [julia-u…sers group discussion](https://groups.google.com/forum/#!topic/julia-users/7IqQROx_zIo). Julia lacks a C-style [switch](http://en.wikipedia.org/wiki/Switch_statement) statement. This issue has [come up](https://github.com/JuliaLang/julia/issues/5410) [before](https://groups.google.com/forum/#!topic/julia-users/pnEqF5w4GJY/discussion) on various fora. Unsurprisingly, Julia also lacks [pattern matching](https://en.wikipedia.org/wiki/Pattern_matching), a useful generalization of case and switch, which has been implemented as a [macro](https://github.com/kmsquire/Match.jl) for Julia. This proposal is concerned with using switch statements over integral (isbits) types to exploit the ability of LLVM's [switch instruction](http://llvm.org/docs/LangRef.html#switch-instruction) to provide a [branch table](https://en.wikipedia.org/wiki/Branch_table) implementation of switch, which is more performant than a sequence of `if-else` conditionals. It should be amenable to supporting extended switch and pattern capabilities in the future; for that reason I'll suggest using adding the `case` keyword to introduce the form. If it's desired that the statement only be used as a C style switch and that pattern matching be introduced separately, perhaps `switch` is the better choice. I'll skip the arguments in around whether to include such a feature at all; interested readers can review some of those arguments in the context of [Python](http://www.python.org/dev/peps/pep-3103/) or [Lua](http://lua-users.org/wiki/SwitchStatement), as the arguments on both sides would be similar for Julia. # Syntax If we'd like to subsume this in a future pattern matching Julia extension, I suggest a syntax influenced by the [Match.jl](https://github.com/kmsquire/Match.jl) macro, replacing `@match` with the keyword `case`~~, and adding a new keyword `of` to use instead of `begin`. I chose `of` because that's used in many other languages that use `case`, and because Julia constructs like `if/for/while` don't require a `begin` for their `end`~~ (Objection sustained). ``` Erlang case case_expr case0 => expr0 case1 => expr1 . . . caseNMinusOne => exprNMinusOne else => exprN end ``` The optional `else` at the end could be replaced by `_` or even `otherwise` or `default`, at the cost of using more keywords. Each case above could be preceded by an `of` if that syntax is preferable. It strikes me as a little wordy but it's a syntax used in other languages and perhaps people find it more readable. ``` Erlang case case_expr of case0 => expr0 of case1 => expr1 . . . of caseNMinusOne => exprNMinusOne else => exprN end ``` # Examples ``` Erlang n = rand(0:32) function print_if_lt_3(n) case n 0 => println("zero") 1 => println("one") 2 => println("two") else => println("too big") end end ``` Should be semantically equivalent to ``` Julia n = rand(0:32) function print_if_lt_3(n) if n == 0 println("zero") elseif n == 1 println("one") elseif n == 2 println("two") else println("too big") end end ``` An example with enums ``` Erlang @enum Dir north northeast east southeast south southwest west northwest function degree_of_dir(d) case d north => 90 northeast => 45 east => 0 southeast => 315 south => 270 southwest => 225 west => 180 northwest => 135 end end ``` The intent is that `@enum` should work well with this `case` statement, which means that the conversion to the enum's Int value should be implicit, as above.

Julia neither has pattern matching built in (like Scala, and recently added to Python and Java), but it’s available with e.g. this package:

Do you know if that is the main package, or at least if it (or any of the many alternatives) are as fast as C’s switch?

gbaraldi · September 22, 2022, 4:49pm

A sequence of if elses or using short circuit evaluation in place of a switch(I like this a lot) will lower to efficient structures (jump tables etc…) if LLVM thinks it will be faster.

See Do C++ performance optimisations apply to Julia? - #30 by gbaraldi

jling · September 22, 2022, 4:49pm

it gets rapidly slower when reaching 5, see the C++ Union splitting thread linked above. other wise comparable in performance because static dispatch has no overhead at run time

jling · September 22, 2022, 5:33pm

I have hoped that I can get away with something like:

function distanceToIn(shape, point, dir)
    if shape isa Trap
        distanceToIn_trap(shape, point, dir)
    elseif shape isa Trd
        distanceToIn_trd(shape, point, dir)
    elseif shape isa Cone
        distanceToIn_cone(shape, point, dir)
    elseif shape isa Box
        distanceToIn_box(shape, point, dir)
    elseif shape isa Tube
        distanceToIn_tube(shape, point, dir)
    elseif shape isa Volume
        distanceToIn_volume(shape, point, dir)
    elseif shape isa Polycone
        distanceToIn_polycone(shape, point, dir)
    elseif shape isa CutTube
        distanceToIn_cuttube(shape, point, dir)
    elseif shape isa Boolean
        distanceToIn_boolean(shape, point, dir)
    end
end

but it made no difference. The example you linked seems to suggest this has to go inside the loop directly?

pdeffebach · September 22, 2022, 5:58pm

Add @nospecialize?

jling · September 22, 2022, 6:09pm

around this steering function? no help

jling · September 22, 2022, 6:26pm

manual splitting actually made performance much worse Trying manual splitting (regression) by Moelf · Pull Request #2 · peremato/Geom4hep · GitHub

splitted into Distance.jl and Inside.jl

stevengj · September 22, 2022, 8:06pm

Does it make a difference if you do distanceToIn_trap(shape::Trap, point, dir) to make sure that the compiler is using the type information here?

jling · September 22, 2022, 8:18pm

no difference.

maybe I’m not finding the correct function that causes the issue… but again, Profiler is not helpful here so it has been really hard to pin down the problem

jling · September 22, 2022, 9:06pm

by annotating the individual function with types, I’m able to recover the original performance (i.e. much slower than C++) and I feel like essentially I have re-created how Julia does things but manually and with no performance improvement

jling · September 22, 2022, 9:41pm

we might have a thread:
this change of 4 lines (Trying manual splitting by Moelf · Pull Request #2 · peremato/Geom4hep · GitHub)

seems to make time 2x worse and allocation 10x larger, all I’m doing here is manual splitting like I did for other shape

Mason · September 22, 2022, 9:50pm

Have you tried out using GitHub - YingboMa/Unityper.jl ? It has some annoying ergonomics like requiring a default keword argument field for every struct, but it is designed to solve this exact performance problem you’re having.

jling · September 22, 2022, 9:56pm

the function we call inside the loop is recursive and branch to different types, this is different from most manual union splitting example (and YingboMa’s example)

Topic		Replies	Views
Disabling allocations Performance	50	5554	December 10, 2020
Update on single dispatch benchmark comparison to C++ Performance cxx , benchmark	19	3058	February 13, 2021
Speed of multiple dispatch Performance	14	1196	November 27, 2023
Union splitting vs C++ Performance	22	4216	July 9, 2021
How slow is runtime dispatch, anyway? Benchmark attempts Performance	8	333	January 15, 2025

Allocation and slow down when # of types involved increase (slower than C++ virtual methods)

*Happy to pay someone to spend time investigating this

Related topics