Base.accumulate! seems slow due to keyword argument handling, can it be sped up?

taotree · January 27, 2023, 5:10pm

Before I found accumulate, I had written a simple implementation. Then I was informed about accumulate! and tried using that. @btime’ing it showed allocations and that it took nearly twice as long to run as the implementation I wrote. Further investigation indicated that the extra time appeared to be in the argument handling. If I bypassed accumulate! to call Base._accumulate!, it would run at the same speed as the one I wrote:

f1(a, x) = (a[2], a[2]+x);
v1 = rand(100);
buf1 = Vector{NTuple{2,Float64}}(undef, length(v1));
init = (0.0, 0.0);
r1 = @btime accumulate!($f1, $buf1, $v1; init=$init);
> 103.814 ns (3 allocations: 96 bytes)
r2 = @btime Base._accumulate!($f1, $buf1, $v1, nothing, Some($init));
> 57.447 ns (0 allocations: 0 bytes)
r1 == r2
> true

Examining the code, it’s doing some conditionals around the keyword arguments. I understand this is to support nothing as a valid value for init. This seems to be a high performance cost for that generality. I tried various attempts in that single method to speed this up while keeping that, but couldn’t find anything that resolved the performance issue. Requiring the user to pass in Some(nothing) if they needed that to be the init could work, but changing the signature or adding a new one is probably undesirable at this point.

However, I think I found a solution. Would it work to use a special private value to signify nothing so the argument default would work? Something like this:

struct _DefinitelyNothingThisTime end
function test_accumulate!(op, B, A; dims::Union{Integer, Nothing} = nothing, init = _DefinitelyNothingThisTime)
    Base._accumulate!(op, B, A, dims, init === _DefinitelyNothingThisTime ? nothing : Some(init))
end

It seems to work (with init, without init, and for init=nothing) and performs well. Is there any issue with that approach?

gbaraldi · January 27, 2023, 6:19pm

Could you open an issue for this?

uniment · January 28, 2023, 12:48am

Nice fix, definitely better than the existing code. The allocations and inefficiencies in the current implementation appear to be (1) due to accumulate! not specializing, and (2) the call to isempty.

Not to be too picky, but it’s a bit more idiomatic to use a singleton instance, rather than the type itself. To illustrate:

julia> nothing ≡ Nothing()
true

Note that your approach is used for mapfoldl (reduce.jl line 170), which uses the singleton object Base._InitialValue() (reduce.jl line 67).

taotree · February 3, 2023, 2:11pm

Thank you for the replies. I have created Performance improvement for accumulate! · Issue #48439 · JuliaLang/julia · GitHub for this.

Topic		Replies	Views
Slightly generalised `accumulate` General Usage question	6	536	August 16, 2020
Using a keyword argument leads to enormous allocations Performance kwargs	8	889	May 9, 2022
What is the best way to re-use a temporary vector Performance performance , memory-allocation	16	338	October 22, 2024
Why using a mutable struct type argument to create instances creates a 50x slowdown? Performance question , type , function	7	533	June 25, 2023
Boltzmann Factor in a loop Performance	38	1802	April 3, 2018

Base.accumulate! seems slow due to keyword argument handling, can it be sped up?

Related topics