Edit:
this question has been reformulated here: How to unroll tree traversal efficiently?
Dear Community,
I would greatly appreciate some pointers to how I could speed up my code.
Some background:
I tried to write a minimal example for my question, but it ended up as a package here: https://github.com/jonalm/BoundingVolumeHierarchies.jl
A Bounding Volume Hierarchy is implemented as a nested set of BVH
structs, with FaceIndices
structs at the terminal nodes. Each BVH
holds a bounding volume (an axis aligned bounding box AABB), and a tuple of children (either a set of BVH
or a single FaceIndices
), and each FaceIndices
holds the face indices of a given mesh.
The BVH
and FaceIndices
structs are specified here:
https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/6f37e7911080587b48e6b65aa7b7af0251a0a349/src/BVH.jl#L8-L18
I would like to traverse all FaceIdices
in the tree whose parent AABB satisfies a criteria (e.g. are candidate for a collision with a line segment). I have the following traversal implementation:
https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/master/src/BVH.jl#L46-L56
My assumption is that, since the the whole tree structure is immutable and defined at compile time, the compiler should be able to generate efficient code.
I test this traversal in the context of finding intersections between a mesh and line segments.
Intersection function is specified here:
https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/6f37e7911080587b48e6b65aa7b7af0251a0a349/src/Intersection.jl#L37-L52
And the test code is here:
https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/master/examples/example.jl#L37-L51
The benchmark results are:
naive method: 505.097 ms (50002 allocations: 1.54 MiB)
using BVH: 83.890 ms (1557325 allocations: 81.88 MiB)
My implementation gives almost an order of magnitude improvement over the naive approach, but at the expense of a lot of memory allocations. I assume this is because of the recursive implementation of the tree traversal (?).
Question:
What is a better implementation strategy for the traversal?
It should be possible to unroll the traversal function as a nested set of if-else statements, as the tree is specified at compile time, does that make it a candidate problem for generated functions?