Pushing a large array into CircularDeque fails


#1

Can anyone explain why the following fails?

julia> using DataStructures

julia> d = CircularDeque{Float64}(100_000)
CircularDeque{Float64}([])

julia> push!(d, randn(), randn())   # no problem
CircularDeque{Float64}([0.2451865901032298,0.3001460029950263])

julia> push!(d, randn(25_000)...)   # wait 2 minutes...
ERROR: StackOverflowError:

I couldn’t find the root definition of push! for multiple values in Base, but I expect it’s a recursion along the n input values, passing n-1 values to a new push! call at every stage. That would explain the excessive memory use.

I can easily move forward via a simple loop so it’s not a problem I need solved - I’m just wondering why push! is implemented this way and if it could be improved, e.g. with an iterative solution.


#2

Splatting is bad for the function’s health, too many arguments to gulp down. Try append! instead.

Edit: append! does not work. A good old loop does the trick though.


#3

Or use Broadcasting tricks:

push!.([d], randn(25_000))


#4

No, it has nothing to do with splatting:

function mypush!(a,items...)
    for i in items
        push!(a,i)
    end
    a
end
julia> using DataStructures;

julia> d = CircularDeque{Float64}(100_000);


julia> @time mypush!(d,randn(25_000)...);
  0.089943 seconds (113.93 k allocations: 6.478 MiB, 5.70% gc time)

julia> @time mypush!(d,randn(25_000)...);
  0.000831 seconds (25.01 k allocations: 781.625 KiB)

Another way to handle this is reduce(push!,d,randn(25_000)).

Or write special methods

function Base.push!(a::DataStructures.CircularDeque{T}, item::T) where T
    invoke(push!, Tuple{DataStructures.CircularDeque{T},Any}, a,item)
end

function Base.push!(a::DataStructures.CircularDeque{T}, item::T, items::T...) where T
    push!(a,item)
    for i in items
        push!(a,i)
    end
    a
end

This gives:

julia> @time push!(d,randn(25_000)...);
  0.002142 seconds (25.01 k allocations: 781.625 KiB)

#5

I stand corrected, it has to do with how the splatted arguments are treated inside, my bad.


#6

There are several functions like this. I used to think it was due to splatting. Even many of the most experienced people say the inefficiency is due to splatting too many arguments until they look into it more deeply.

Note that I am not saying that splatting 10^4 or 10^5 arguments is good or bad practice, just that it is not the cause of the huge inefficiency observed.


#7

Vararg functions often (but not always) recurse. Recursing 25000 times leads to a stack overflow.