You actually found one,
For an outsider reading this with curiosity about takeaways for similar problems, would you mind sharing what, if any, general solution strategy you found? I tried reading through the branch diffs in Hunting allocation · Issue #1 · peremato/Geom4hep · GitHub to follow, but I got a bit lost in the details of the application. I totally understand if you donât have time for such a request.
For another take on this topic, see https://j-fu.github.io/marginalia/julia/unionize/
so hereâs the problem: non of the examples mentioned above including yours (which looks very nice btw) have MUTUALLY recursiveness among types and the function f()
if our case were like this, weâd easily just write a sumup_f_manual
and re-write the whole workload as an explicit loop
It seems to me that there are a few issues with your boolean type:
-
Nested Parametric Type Explosion: There are 8 primitive types and 11 parametric types if you include the 3 boolean types(Union, Subtraction, and Intersection). Looking at the CMS2018 file it looks like there is typically at least 2 layers of nested booleans. So it seems like very quickly you could get more than 512( which is 8^3) different types. You might consider making this a less parametric type This seems very close to the pitfalls mentioned in:
The-dangers-of-abusing-multiple-dispatch-(aka,-more-on-types-with-values-as-parameters) -
Not using bounding boxes for the initial screening: It seems like if you calculated and stored a bounding box for both the target and tool(I think you call them Left & Right) in the boolean object at creation time, so that you donât have to dispatch on the sub shapes to get the bounding boxes. You could have a course outer method that works only on the bounding boxes, would be the same all the booleans regardless of the tool and target type, and likely intercept/screen 90+% calls to this function. Then have a more precise fine version of the method that actually dispatches to the primitivesâ method.
- I think that at the top level you are doing this with Bounded Volume Hierarchy but if not something similar should be done there.
this is great I guess @peremato has seen the Tips page but decided to implement according to the C++ design as a first draft; but yeah, we should try to fix both points
Sorry for the long break to comeback to this thread.
The memory allocation problem is solved by having the 3 BooleanShape
s types not parametric with left and right shapes but using Union{BaseShape{T}, BooleanShape{T}}
for the left and right shapes. This was figured out by @grasph. Thanks!
However the performance is yet recovered.
The problem with the performance comes from the fact that each method of BooleanShape
has to deal with dynamic dispatch of the sub-shapes (left and right). The only way to implement these methods is by using the methods of the sub-types. For instance, to find in a point in inside a BooleanUnion
you need to ask if the point is in the Left shape or
in the Right shape. You cannot not implement this method in any other way.
I made a test of calling the inside
method of a 4-nested boolean structure with the current implementation using as type for left and right Union{BaseShape{T}, BooleanShape{T}}
compared with the original parametric type definition like this:
struct BooleanShape{T<:AbstractFloat, SL<:AbstractShape{T}, SR<:AbstractShape{T}, OP} <: AbstractShape{T}
left::SL # the mother (or left) volume A in unplaced form
right::SR # (or right) volume B in placed form, acting on A with a boolean operation
transformation::Transformation3D{T} # placement of "right" with respect of "left"
end
and can see almost a factor 2 in performance. This means that the weight of dynamic dispatch is enormous.
@ndinsmore is right, at creation time I know exactly the types for the sub-shapes. So, if I could store this information I would not need to make use of dynamic dispatch at run-time (which is exactly the case when we use the types of the sub-shapes as parameters of the BooleanShape
type). The problem is that I do to know how to do it. I could store the actual method that I would need to call at run-time for example using left_inside = which(inside,(typeof(left), Point3{T}))
, but it seems that in Julia I cannot call directly methods. You get MethodError: objects of type Method are not callable
.
Somehow what I want to reproduce is to build the equivalent of a C++ virtual table for the left and right sub-shapes. Does anybody knows if this is possible?
so I ran into @sjkelly yesterday at Julia user meeting, and we talked about the performance of this thread.
I think we may be just barely abusing types in this case. The Performance Tips pages in the manual use âcar brands <=> typesâ as an example of type abusing. Here, the # of geometries is not a problem itself, but because we (have to) parametrize over the child types for meta geometries, the # of types explodes.
We could opt for more manual methods, like using Enum instead of types for the meta geometries (Boolean), so instead of putting child type into the parameter, we just store a Enum, and do if-else
when a Boolean shape is encountered.
This is exactly what I try to do with the latest version. There are in total 10 types (7 basic and 3 booleans) no more. The BooleanShapes encapsulate the abstraction with the left and right types but themselves are not abstract. The problem is the poor performance when calling a function using one of the booleans because of dynamic dispatch, which is a pity since I know exactly the types at construction time. If I could call the correct method stored at construction time would be solution.
With manual splitting if shape isa Box; elseif shape isa Tube; ...
you get only a marginal 5% improvement, while getting rid completely of the dynamic dispatch (tested using parametric sub-shape types, which I cannot use in the final project) in a complex recursive structure you get easily 50% improvement. So, this is what I am aiming to recover all the lost performance due to dynamic dispatch.
hmm yeah FunctionWrapper mimicking C++'s single dispatch is possible
In case itâs helpful, I put together a simple example of what is effectively a c++ style vtable using FunctionWrappers here: ConcreteInterfaces.jl/demo.ipynb at master · rdeits/ConcreteInterfaces.jl · GitHub
Thank-you. It looks very interesting. Iâll give it a try.
@rdeits unfortunately rdeits/ConcreteInterfaces.jl is not working with a recent version of Julia. In any case, I have made a test with FunctionWrapper
by hand instead of using the @interface
macro, which looks very handy.
I have defined a partial interface to Shapes
with only two methods (inside and extent) to start with.
#---Define shape interfaces (aka C++ virtual table)------------------------------------------------
struct ShapeInterface{T<:AbstractFloat} <: AbstractInterface
self::AbstractShape{T}
inside::FunctionWrapper{Int64, Tuple{Point3{T}}}
extent::FunctionWrapper{Tuple{Point3{T},Point3{T}},Tuple{}}
ShapeInterface{T}(self::AbstractShape{T}) where T = new(self, (p::Point3{T} -> inside(self, p)), (()->extent(self)))
end
import Geom4hep: inside, extent
inside(i::ShapeInterface{T}, p::Point3{T}) = i.inside(p)
extent(i::ShapeInterface{T}) = i.extent()
then I define the new BooleanShape
like this:
struct NBooleanShape{T<:AbstractFloat, OP} <: AbstractShape{T}
left::ShapeInterface{T} # the mother (or left) volume A in unplaced form
right::ShapeInterface{T} # (or right) volume B in placed form, acting on A with a boolean operation
transformation::Transformation3D{T} # placement of "right" with respect of "left"
end
The NBooleanShape has good properties (type stability, no allocations at runtime, etc.) but the performance is not better. With a nested boolean shape you get a factor 2 worst performance.
with the original code (as in the head of repository)
62.709 ns (0 allocations: 0 bytes)
with the new BooleanShape
110.988 ns (0 allocations: 0 bytes)
full example
using Revise
using Geom4hep
using FunctionWrappers
import FunctionWrappers: FunctionWrapper
abstract type AbstractInterface end
#---Define shape interfaces (aka C++ virtual table)------------------------------------------------
struct ShapeInterface{T<:AbstractFloat} <: AbstractInterface
self::AbstractShape{T}
inside::FunctionWrapper{Int64, Tuple{Point3{T}}}
extent::FunctionWrapper{Tuple{Point3{T},Point3{T}},Tuple{}}
ShapeInterface{T}(self::AbstractShape{T}) where T = new(self, (p::Point3{T} -> inside(self, p)), (()->extent(self)))
end
import Geom4hep: inside, extent
inside(i::ShapeInterface{T}, p::Point3{T}) = i.inside(p)
extent(i::ShapeInterface{T}) = i.extent()
#---New definition of Boolean Shapes--------------------------------------------------------------------
struct NBooleanShape{T<:AbstractFloat, OP} <: AbstractShape{T}
left::ShapeInterface{T} # the mother (or left) volume A in unplaced form
right::ShapeInterface{T} # (or right) volume B in placed form, acting on A with a boolean operation
transformation::Transformation3D{T} # placement of "right" with respect of "left"
end
const NBooleanUnion{T} = NBooleanShape{T, :union}
const NBooleanSubtraction{T} = NBooleanShape{T, :subtraction}
const NBooleanIntersection{T} = NBooleanShape{T, :intersection}
#---Constructor--------------------------------------------------------------------------------------
function NBooleanSubtraction(left::AbstractShape{T}, right::AbstractShape{T}, place::Transformation3D{T}=one(Transformation3D{T})) where T<:AbstractFloat
NBooleanSubtraction{T}(ShapeInterface{T}(left), ShapeInterface{T}(right), place)
end
function inside(shape::NBooleanSubtraction{T}, point::Point3{T})::Int64
(; left, right, transformation) = shape
lpoint = transformation * point
positionA = inside(left, point)
positionA == kOutside && return kOutside
positionB = inside(right, lpoint)
if positionA == kInside && positionB == kOutside
return kInside;
elseif positionA == kInside && positionB == kSurface ||
positionB == kOutside && positionA == kSurface
return kSurface
else
return kOutside
end
end
function extent(shape::NBooleanSubtraction{T})::Tuple{Point3{T},Point3{T}} where {T}
extent(shape.left)
end
const T = Float64
#---New boolean instance
nboolean = NBooleanSubtraction(NBooleanSubtraction(NBooleanSubtraction(NBooleanSubtraction(Box{T}(2.035, 100.35000000000001, 0.5750000000000001),
Box{T}(0.3175, 100.35000000000001, 0.065),
Transformation3D{T}(1.7175000000000002, 0.0, 0.51)),
Box{T}(0.3175, 100.35000000000001, 0.065),
Transformation3D{T}(1.7175000000000002, 0.0, -0.51)),
Box{Float64}(0.3175, 100.35000000000001, 0.065),
Transformation3D{T}(-1.7175000000000002, 0.0, 0.51)),
Box{Float64}(0.3175, 100.35000000000001, 0.065),
Transformation3D{T}(-1.7175000000000002, 0.0, -0.51))
#---Old boolean instance
boolean = BooleanSubtraction(BooleanSubtraction(BooleanSubtraction(BooleanSubtraction(Box{T}(2.035, 100.35000000000001, 0.5750000000000001),
Box{T}(0.3175, 100.35000000000001, 0.065),
Transformation3D{T}(1.7175000000000002, 0.0, 0.51)),
Box{T}(0.3175, 100.35000000000001, 0.065),
Transformation3D{T}(1.7175000000000002, 0.0, -0.51)),
Box{Float64}(0.3175, 100.35000000000001, 0.065),
Transformation3D{T}(-1.7175000000000002, 0.0, 0.51)),
Box{Float64}(0.3175, 100.35000000000001, 0.065),
Transformation3D{T}(-1.7175000000000002, 0.0, -0.51))
point = Point3{T}(0,0,0)
# benchmaking
using BenchmarkTools
@btime inside(boolean, point)
@btime inside(nboolean, point)
@peremato what I was saying before was to store the actual bounding box of the left and right shape so that you could filter out maybe a high percentage of the multiple dispatch calls by boolean. So you would check if the ray or point was in the BBox before ever dispatching, but again the key to this is to store BBox before hand.
I got a chance to actually try the code out and I donât think that your problem has anything to do with dispatch. Apparently when you run @profview anything that shows up in red is a dispatch problem.
At least from a complexity point of view, the time difference between your two different examples is 100% related to how many times the method distanceToIn(...)
gets called because of the way the geometry is built up.
With some poorly written instrumentation code, I counted the call to ``distanceToIn(âŠ)``` for all the different types:
For âexamples/trackML.gdmlâ
julia> Geom4hep.call_count
Dict{Any, Int64} with 2 entries:
PlacedVolume => 123264
Trd => 23104
For âexamples/cms2018.gdmlâ
julia> Geom4hep.call_count
Dict{Any, Int64} with 6 entries:
Tube => 108529
PlacedVolume => 44120474
Box => 47601314
BooleanUnion => 6553615
BooleanSubtraction => 10157038
Trap => 4397782
Which means in the benchmarking just the PlaceVolume
version of that method is getting called 357x more times and the ratios in your benchmarked times above are only 209x. You will know from benchmarking that the distanceToIn(...)
method is the majority of the problem.
I hate to say it but I think that you have been chasing the wrong gremlin.
I can send you the instrumentation code if you like.
Iâm not sure what you mean here. Yes weâre calling distanceToIn()
many more times, but we know from a C++ counterpart that the cms2018
geometry shouldnât be this slow. And it seems that as long as donât have Boolean()
shapes itâs not a problem â you can test this out by skip parsing Boolean
geometries
Yes, you are right and we are already doing it. We have an implementation of BVH accelerating structure to avoid large searches, which speedup many factors (see BVH.jl). But at the end of the day, to make a precise determination of the distance to a surface you need to have the real shape and not an approximative bounding box.
First can we agree to a few things:
- The majority of the computation time is spent in
distanceToIn()
for both models. - Your legacy C++ code runs about as fast as the julia version on the TrackML example
- When you say that the Julia version is slow for the cms2018 example you mean 2x - 5x the time of C++ code not 200x the time.
- Calling the the function that is most of the computing time 350x more times and including boolean types(ie multiple dispatch) but only increasing the run time 200x is a reasonable outcome.
Also when you say that not parsing the booleans does that mean the code runs 30-40% faster (which is about what I get on my machine 2.8s â 1.8s)? Because from the counting above if remove 16M boolean calls of distanceToIn()
each of which takes with it a call of distanceToIn()
to as primative type that accounts for the decrease in time alone.
If you run my instrumentation code on the cms2018 example with the boolean ignored you get:
Dict{Any, Int64} with 3 entries:
PlacedVolume => 45856163
Box => 38865821
Trap => 6881813
I think this is an algorithm issue not a dispatch issue.