Dict with integer keys and fastest insertion?

What are the best options for a dict/map with integer keys and the fastest insert/update? For background, I wish to aggregate those position-weight pairs (p,w) where the position is the same, I’m tinkering with the idea of having a sparse matrix be implemented as a list of dict’s, one for each column, and the matrix product needs this aggregation. So far, my benchmarks are:

function f0(pp::Vector{tp}, ww::Vector{tw}, Z::Vector{tw})  where {tp,tw}
    # aggregating with a dense vector Z
    for (p,w) ∈ zip(pp,ww)  Z[p] += w end;   
    _pp = findall(Z .≠ 0)
    _ww = Z[_pp]
    Z .= 0  # clean up
    return _pp, _ww end;

function _add!(D::Dict{tp,tw}, p::tp, w::tw)  where {tp, tw}
    idx = Base.ht_keyindex(D,p)
    if idx<0  D[p] = w  
    else D.vals[idx] += w end end;

function f1(pp::Vector{tp}, ww::Vector{tw}, D::Dict{tp,tw})   where {tp,tw}
    # aggregating with a dict
    for (p,w) ∈ zip(pp,ww)  
        @inline _add!(D,p,w) end 
    filter!(pw -> !iszero(pw[2]), D) end;  # clean up

n, N =10^5, 10^7;
pp, ww = UInt32.(rand(1:n,N)), rand(-1:0.001:1,N);
Z, D = fill(0.0, n), Dict{UInt32,Float64}();
@time f0(pp, ww, Z);
@time f1(pp ,ww, D);
  0.034545 seconds (11 allocations: 1.538 MiB)
  0.212361 seconds (39 allocations: 4.334 MiB)

findall(!iszero, Z) might be faster

1 Like

Is there a way to also specify the type of elements in output of findall?
Something like findall(UInt32, !iszero, Z)?

Probably even better to overwrite pp and ww in-place if you can (since you are already overwriting Z), and merge the loops, e.g.

function f0!(pp::AbstractVector, ww::AbstractVector, Z::AbstractVector)
    for (p,w) ∈ zip(pp,ww); Z[p] += w; end
    ip, iw = firstindex(pp)-1, firstindex(ww)-1
    for p in eachindex(Z)
        @inbounds if Z[p] ≠ 0
            pp[ip += 1] = p
            ww[iw += 1] = Z[p]
        end
    end
    return resize!(pp, ip - firstindex(pp)+1),
           resize!(ww, iw - firstindex(ww)+1)
end

(Note that argument-type declarations like pp::Vector{tp} make no difference whatsoever to the performance; they just make your function less generic.)

3 Likes

No, in the case of matrix multiplication, I cannot overwrite pp and ww. But thank you! Any idea about fast Dict’s that I mentioned?

These two will have very different memory footprints. The former is proportional to domain size, the latter is proportional to size of support of pp values. This may determine the method to use. If f0 memory footprint is acceptable, it should be hard to beat in terms of performance.

can be written as: filter!((!iszero)∘last, D)

And _add! can be elided with the use of mergewith!:

using OrderedCollections
f2(pp::Vector{tp}, ww::Vector{tw}, D::Dict{tp,tw})   where {tp,tw} =
  filter!((!iszero)∘last, mergewith!(+, D, LittleDict(pp,ww)))

Finally, better to benchmark with @btime (in BenchmarkTools) instead of @time (due to non-determinism and compilation time)

1 Like

This last suggestion is much slower:

@time f0(pp,ww,Z);  @time f1(pp,ww,D1); @time f2(pp,ww,D2);
  0.054900 seconds (11 allocations: 1.538 MiB)
  0.243859 seconds (39 allocations: 4.334 MiB)
  0.871619 seconds (6 allocations: 208.000 MiB, 4.03% gc time)

I also tried Base.IdDict and Dictionaries.UnorderedDictionary, but both were slower than f1.

@time should be run twice to avoid including compilation times.

Or better yet, using BenchmarkTools.jl or ChairMarks.jl to perform more careful repeated timing.

(However, you have to be careful to include a setup phase to re-initialize the arguments between benchmarks, since these functions modify their third argument.)