this question has been reformulated here: How to unroll tree traversal efficiently?
I would greatly appreciate some pointers to how I could speed up my code.
I tried to write a minimal example for my question, but it ended up as a package here: https://github.com/jonalm/BoundingVolumeHierarchies.jl
A Bounding Volume Hierarchy is implemented as a nested set of
BVH structs, with
FaceIndices structs at the terminal nodes. Each
BVH holds a bounding volume (an axis aligned bounding box AABB), and a tuple of children (either a set of
BVH or a single
FaceIndices), and each
FaceIndices holds the face indices of a given mesh.
FaceIndices structs are specified here:
I would like to traverse all
FaceIdices in the tree whose parent AABB satisfies a criteria (e.g. are candidate for a collision with a line segment). I have the following traversal implementation:
My assumption is that, since the the whole tree structure is immutable and defined at compile time, the compiler should be able to generate efficient code.
I test this traversal in the context of finding intersections between a mesh and line segments.
Intersection function is specified here:
And the test code is here:
The benchmark results are:
naive method: 505.097 ms (50002 allocations: 1.54 MiB) using BVH: 83.890 ms (1557325 allocations: 81.88 MiB)
My implementation gives almost an order of magnitude improvement over the naive approach, but at the expense of a lot of memory allocations. I assume this is because of the recursive implementation of the tree traversal (?).
What is a better implementation strategy for the traversal?
It should be possible to unroll the traversal function as a nested set of if-else statements, as the tree is specified at compile time, does that make it a candidate problem for generated functions?