Base.accumulate! seems slow due to keyword argument handling, can it be sped up?

Before I found accumulate, I had written a simple implementation. Then I was informed about accumulate! and tried using that. @btime’ing it showed allocations and that it took nearly twice as long to run as the implementation I wrote. Further investigation indicated that the extra time appeared to be in the argument handling. If I bypassed accumulate! to call Base._accumulate!, it would run at the same speed as the one I wrote:

f1(a, x) = (a[2], a[2]+x);
v1 = rand(100);
buf1 = Vector{NTuple{2,Float64}}(undef, length(v1));
init = (0.0, 0.0);
r1 = @btime accumulate!($f1, $buf1, $v1; init=$init);
> 103.814 ns (3 allocations: 96 bytes)
r2 = @btime Base._accumulate!($f1, $buf1, $v1, nothing, Some($init));
> 57.447 ns (0 allocations: 0 bytes)
r1 == r2
> true

Examining the code, it’s doing some conditionals around the keyword arguments. I understand this is to support nothing as a valid value for init. This seems to be a high performance cost for that generality. I tried various attempts in that single method to speed this up while keeping that, but couldn’t find anything that resolved the performance issue. Requiring the user to pass in Some(nothing) if they needed that to be the init could work, but changing the signature or adding a new one is probably undesirable at this point.

However, I think I found a solution. Would it work to use a special private value to signify nothing so the argument default would work? Something like this:

struct _DefinitelyNothingThisTime end
function test_accumulate!(op, B, A; dims::Union{Integer, Nothing} = nothing, init = _DefinitelyNothingThisTime)
    Base._accumulate!(op, B, A, dims, init === _DefinitelyNothingThisTime ? nothing : Some(init))

It seems to work (with init, without init, and for init=nothing) and performs well. Is there any issue with that approach?

1 Like

Could you open an issue for this?

Nice fix, definitely better than the existing code. The allocations and inefficiencies in the current implementation appear to be (1) due to accumulate! not specializing, and (2) the call to isempty.

Not to be too picky, but it’s a bit more idiomatic to use a singleton instance, rather than the type itself. To illustrate:

julia> nothing ≡ Nothing()

Note that your approach is used for mapfoldl (reduce.jl line 170), which uses the singleton object Base._InitialValue() (reduce.jl line 67).

Thank you for the replies. I have created Performance improvement for accumulate! · Issue #48439 · JuliaLang/julia · GitHub for this.

1 Like