Union splitting vs C++

Hello! I created an experimental package SingleDispatchArrarys.jl that generates dispatch table for non-homogeneous arrays, which achieves c++ like performance for non-homogeneous arrays and is much more flexible to manage than manual switch-case.

However, c++ like performance is not necessary a good thing. With long if-elseif-elseif chain, LLVM generates jump table for this. On x86 machines, loading a jump table is nearly as fast as branching, but on Raspberry pi 4b, jump table is actually twice as slow as branching(20ms vs 10ms), looks like LLVM has made a bad choice.

7 Likes