C++ code much faster than Julia how can I optimize it?

I see in the C++ code, that rows and cols are by reference, as opposed to result:

void _smawk(
        const vector<ulong>& rows,
        const vector<ulong>& cols,
        const function<T(ulong, ulong)>& lookup,
        vector<ulong>* result) {
    // Recursion base case
    if (rows.size() == 0) return;

You can’t specify that in Julia (at the syntax level), but I find it likely that it means rows and cols are small (and you want them stack-allocated), and you want to use StaticArrays.jl for those, to get that done.

I understand C++ can have vectors (and fixed-sized arrays) on the stack, Julia can but only for fixed-sized now. I recall @elrod having some advanced Julia code from LLVM for hybrid fixed/heap vector code. So I’m not sure you can do this without it unless there’s some max. (small) size.

You can’t push! to SVector or MVector, from that package, and I’m not up-to-speed on that package (or other; building on that one), so it seems SizedVector isn’t for that either. What you want is a type that stores the max, and a count of actually used thereof (of the stack), so that you can push! and pop!. The LLVM code I mentioned is like that, but then above some max then it heap-allocates for it. I believe that may be the default in C++, one of its few advantages. Inherently there’s nothing missing to define such in Julia, as if I recall already done. But it’s not done for the default arrays in Julia (yet), nor it seems in that package or any I know of.

I tried the @tailrec macro, but it seems ineffective, since TCO doesn’t apply here (nor then in C++), so, to answer my own question, I don’t think C++ does anything clever per se for recursion either, except stack-allocate you instruct it to do.