Avoiding Allocations with User provided Closures

I’m writing a query API that allows a user to pass in a closure to be run on every entry satisfying the query, however this usually results in undesirable boxing. For example, here’s the kind of API I want to offer, with the query replaced by a simple foreach loop.

function foreach(f, x)
    for i = 1:length(x)
        f(x[i])
    end
end

function bench()
    x = rand(1000)
    count = 0
    @time foreach(x) do val
        count += 1
    end
    count
end

x = rand(1000)
bench(x)
#   0.000005 seconds (489 allocations: 7.641 KiB)

Now I know why this is allocating, because Ints aren’t mutable and the compiler boxes count on each iteration so that it can be incremented in the closure, but this is sort of a nonintuitive bug for my users. I want to provide this kind of API rather than some pre canned count, etc. function because this is flexible and the compiler should be able to optimise for it. How can I provide an API like this that doesn’t suck performance wise?

Is this https://github.com/JuliaLang/julia/issues/15276? Then, maybe educate your users until the bug is fixed? See FastClosures.jl

Yeah, you may just need to help your users out a little bit. A good rule of thumb, which might not be hard for you to convey to your users, is that anything the closure modifies should be wrapped in a Ref:

julia> function foreach(f, x)
           for i = 1:length(x)
               f(x[i])
           end
       end
foreach (generic function with 1 method)

julia> function bench(x)
           count = Ref(0)
           foreach(x) do val
               count[] += 1
           end
           count[]
       end
bench (generic function with 2 methods)

julia> @btime bench($x)
  253.535 ns (0 allocations: 0 bytes)
1000

(note that in this case, the compiler is even able to skip actually allocating the Ref, which is pretty cool).

If you want to get really fancy, you can actually check fieldtypes(typeof(f)) and look for Core.Box in order to print a helpful warning to your users. This might even be a legitimate use case for an @generated function, e.g.:

julia> box_warn(x) = nothing
box_warn (generic function with 3 methods)

julia> @generated function box_warn(::T) where {T <: Function}
         if Core.Box in fieldtypes(T)
           :(@warn("Oh no! A Core.Box!"))
         end
       end
box_warn (generic function with 3 methods)

julia> function bench_bad(x)
           count = 0
           foreach(x) do val
               count += 1
           end
           count
       end
bench_bad (generic function with 1 method)

julia> function bench_good(x)
           count = Ref(0)
           foreach(x) do val
               count[] += 1
           end
           count[]
       end
bench_good (generic function with 1 method)

julia> bench_bad(x)
┌ Warning: Oh no! A Core.Box!
└ @ Main REPL[49]:3
1000

julia> bench_good(x)
1000

But I’m not sure this level of complexity is actually a good idea.

Edit: you also need to use box_warn in foreach, e.g.

function foreach(f, x)
  box_warn(f)
  ...
end

(Thanks @Mason)

4 Likes

@rdeits Did you mean to include a redefinition of foreach in your code that uses box_warn?

If user functions need some state (like count), I believe it’s much better to design API based on foldl (or reduce) rather than foreach. The example in the OP would look like

count = foldl(x; init=0) do count, val
    count += 1
end

Slightly less trivial example:

(count, sum) = foldl(x; init=(0, false)) do (count, sum), val
    count += 1
    sum += val
    (count, sum)
end

Making this kind of operation composable would lead you to transducers. FYI, here is my implementation: https://github.com/tkf/Transducers.jl

7 Likes

Fixed, thanks!

Thanks all!

So to summarise, capturing variables in closures is a known performance issue. The direct solution is to wrap captured mutable variables in a Ref, which the compiler can optimise out the allocation of.

The indirect (and perhaps more Julian?) solution would be to separate out this particular usage of the query API as a more “functional” reduce/foldl, which I think makes sense in the example I gave, because logically the user is computing a reduction.

Users would still have the option to pass in a plain closure, but they’d have to be careful about capturing mutable variables.

2 Likes