Efficient traversal of tree defined as nested structs

jonalm · May 2, 2019, 12:19pm

Edit:

this question has been reformulated here: How to unroll tree traversal efficiently?

Dear Community,

I would greatly appreciate some pointers to how I could speed up my code.

Some background:

I tried to write a minimal example for my question, but it ended up as a package here: https://github.com/jonalm/BoundingVolumeHierarchies.jl

A Bounding Volume Hierarchy is implemented as a nested set of BVH structs, with FaceIndices structs at the terminal nodes. Each BVH holds a bounding volume (an axis aligned bounding box AABB), and a tuple of children (either a set of BVH or a single FaceIndices), and each FaceIndices holds the face indices of a given mesh.

The BVH and FaceIndices structs are specified here:
https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/6f37e7911080587b48e6b65aa7b7af0251a0a349/src/BVH.jl#L8-L18

I would like to traverse all FaceIdices in the tree whose parent AABB satisfies a criteria (e.g. are candidate for a collision with a line segment). I have the following traversal implementation:
https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/master/src/BVH.jl#L46-L56

My assumption is that, since the the whole tree structure is immutable and defined at compile time, the compiler should be able to generate efficient code.

I test this traversal in the context of finding intersections between a mesh and line segments.

Intersection function is specified here:
https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/6f37e7911080587b48e6b65aa7b7af0251a0a349/src/Intersection.jl#L37-L52

And the test code is here:
https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/master/examples/example.jl#L37-L51

The benchmark results are:

naive method:  505.097 ms (50002 allocations: 1.54 MiB)
using BVH:  83.890 ms (1557325 allocations: 81.88 MiB)

My implementation gives almost an order of magnitude improvement over the naive approach, but at the expense of a lot of memory allocations. I assume this is because of the recursive implementation of the tree traversal (?).

Question:
What is a better implementation strategy for the traversal?
It should be possible to unroll the traversal function as a nested set of if-else statements, as the tree is specified at compile time, does that make it a candidate problem for generated functions?

baggepinnen · May 3, 2019, 6:21am

A generated function has access to the types of the input arguments, but not their values. If the entire tree structure is contained in the type information, then yes, you could potentially use a generated function to unroll the traversal. Have you profiled your code and verified that a significant amount of time is spent in the traversal code? If not, I would try this before attempting such a generated function.

jonalm · May 8, 2019, 12:42pm

Thanks for answering @baggepinnen! I’ve made some progress in implementing a generated function, and posted a reformulation of question (link in question edit).

Topic		Replies	Views
How to unroll tree traversal efficiently? General Usage metaprogramming , tree	1	1040	May 8, 2019
Is it possible to traverse a tree without allocations Performance memory-allocation , tree	10	637	August 15, 2022
Efficiently traverse binary tree represented as tuple of tuples Performance memory-allocation , benchmark	9	1921	August 7, 2019
Best approach for runtime dispatching inside a hot loop (heterogeneous tree structure) Performance	18	2765	March 3, 2018
Specialization tips to shrink 45x performance gap when comparing structs Performance	7	962	September 30, 2019

Efficient traversal of tree defined as nested structs

Related topics