How to capture an index in closure without allocating (or workaround)?

Sorry the code is a bit long but the middle testIt! function is where the question is. If I run run() as-is, there are ~160k allocations. However, if I comment out the code with i, then there are no allocations. I’m assuming this is because it has to box i to capture it in the closure. I ran @code_warntype on both cases and they’re both fine except there is an i::Core.Box when the i var is there. Is there way to structure this simply that will avoid allocations?

function itrCombis(f, v, ::Val{4})
    len = length(v)
    for i in 1:len
        for j in i:len
            for k in j:len
                for l in k:len
                    f((v[i], v[j], v[k], v[l]))
                end
            end
        end
    end
end

function testIt!(res, v1, v2)
    i = 0 # Ref{Int}(0)
    function inner(c1, c2)
        allCombi = Iterators.flatten(Iterators.product(c1, c2))
        for combi in allCombi
            i += 1 # i[] += 1
            res[i] = combi[1] > 5.0 ? nothing : 2.0
        end
        return
    end

    function outer(c1)
        itrCombis(c2 -> inner(c1, c2), v2, Val(4))
        return
    end

    itrCombis(outer, v1, Val(4))
    return
end

function run()
    res = Vector{Union{Nothing,Float64}}(undef, 100000000)
    num = 5
    v1 = collect(1:num)
    v2 = collect(1:num)
    testIt!(res, v1, v2)
    @time testIt!(res, v1, v2)
end

I tried using i = Ref{Int}(0) and that helped a lot. But it still allocates (1 allocation: 16 bytes), which is much better. I could probably thread the index through the functions, but I wanted to ask if there is maybe some way to do this type of closures without allocations.

I was using the Combinatorics package originally, but discovered that its combinations function allocates, so that’s why I put in the itrCombis function in the example here.

Can’t test this but is the allocation maybe just from benchmarking with @time instead of @btime?

No. You’ll see it does run testIt before the @time, so it’s timing the second run. Also, I have run @btime directly against testit() from the REPL for all three cases and it shows the same results.

Maybe this is what you need: GitHub - c42f/FastClosures.jl: Faster closure variable capture

But that i update inside the closure seems a bug minefield to me. Can’t you pass it as a parameter so you can be sure what are you updating?

Remembering that “Closures are poor man’s objects and objects are poor man’s closures” you can explicitly represent and pass in your closure:

mutable struct MyInner{T1, T2}
    i::T1
    res::T2
end

function (in::MyInner)(c1, c2)
    allCombi = Iterators.flatten(Iterators.product(c1, c2))
    for combi in allCombi
        in.i += 1
        in.res[in.i] = combi[1] > 5.0 ? nothing : 2.0
    end
    return
end

function testIt!(inner, v1, v2)
    function outer(c1)
        itrCombis(c2 -> inner(c1, c2), v2, Val(4))
        return
    end
    
    itrCombis(outer, v1, Val(4))
    return
end

function run()
    res = Vector{Union{Nothing,Float64}}(undef, 100000000)
    num = 5
    v1 = collect(1:num)
    v2 = collect(1:num)
    in = MyInner(0, res)
    testIt!(in, v1, v2)
    @time testIt!(in, v1, v2)
end

Don’t think its worth the effort, just to prevent 16 bytes of allocation.