Edit:

this question has been reformulated here: How to unroll tree traversal efficiently?

Dear Community,

I would greatly appreciate some pointers to how I could speed up my code.

Some background:

I tried to write a minimal example for my question, but it ended up as a package here: https://github.com/jonalm/BoundingVolumeHierarchies.jl

A Bounding Volume Hierarchy is implemented as a nested set of `BVH`

structs, with `FaceIndices`

structs at the terminal nodes. Each `BVH`

holds a bounding volume (an axis aligned bounding box AABB), and a tuple of children (either a set of `BVH`

or a single `FaceIndices`

), and each `FaceIndices`

holds the face indices of a given mesh.

The `BVH`

and `FaceIndices`

structs are specified here:

https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/6f37e7911080587b48e6b65aa7b7af0251a0a349/src/BVH.jl#L8-L18

I would like to traverse all `FaceIdices`

in the tree whose parent AABB satisfies a criteria (e.g. are candidate for a collision with a line segment). I have the following traversal implementation:

https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/master/src/BVH.jl#L46-L56

My assumption is that, since the the whole tree structure is immutable and defined at compile time, the compiler should be able to generate efficient code.

I test this traversal in the context of finding intersections between a mesh and line segments.

Intersection function is specified here:

https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/6f37e7911080587b48e6b65aa7b7af0251a0a349/src/Intersection.jl#L37-L52

And the test code is here:

https://github.com/jonalm/BoundingVolumeHierarchies.jl/blob/master/examples/example.jl#L37-L51

The benchmark results are:

```
naive method: 505.097 ms (50002 allocations: 1.54 MiB)
using BVH: 83.890 ms (1557325 allocations: 81.88 MiB)
```

My implementation gives almost an order of magnitude improvement over the naive approach, but at the expense of a lot of memory allocations. I assume this is because of the recursive implementation of the tree traversal (?).

Question:

What is a better implementation strategy for the traversal?

It should be possible to unroll the traversal function as a nested set of if-else statements, as the tree is specified at compile time, does that make it a candidate problem for generated functions?