Benchmarking Dict against C++ std::unordered_map

I benchmarked Julia’s dictionary insertion and deletion with the following code,

function test_julia_dict(n::Int64)
    m = Dict{Int, Int}()
    for i = 1:n
        m[5*i] = i
    end
    for i = 1:n-1
        delete!(m, 5*i)
    end
    return m[5*n]
end

I compared against C++'s std::unordered_map using g++ 9.4.0. The test is done by exposing an extern "C" interface from C++ and accessing the shared library using Julia’s ccall. I’ve also tested a standalone C++ version to confirm that ccall has essentially zero overhead. For n=10^7, The Julia version takes 1.7s, and the C++ version 0.87s on my laptop. I know that there are many third-party hash table libraries which out-perform std::unordered_map, so the difference would be even bigger. Is there anything I’ve missed? Also, if I want to use call the C++ unordered_map for arbitrary Julia types to speed up programs where this is a bottleneck, is there an easy way to do this? The full Julia and C++ benchmark code is attached below.

C++ code:

// MyDict.cpp
// Compile with `g++ MyDict.cpp -fPIC -shared -O2 -o MyDict.so`

#include <unordered_map>
#include <cstdint>

std::unordered_map<int64_t, int64_t> *m;

extern "C" void dict_init();
extern "C" void dict_destruct();
extern "C" void dict_setindex(int64_t, int64_t);
extern "C" int64_t dict_getindex(int64_t);
extern "C" void dict_delete(int64_t);

void dict_init() {
        m = new std::unordered_map<int64_t, int64_t>;
}

void dict_destruct() {
        delete m;
}

void dict_setindex(int64_t value, int64_t key) {
        (*m)[key] = value;
}

int64_t dict_getindex(int64_t key) {
        return (*m)[key];
}

void dict_delete(int64_t key) {
        (*m).erase(key);
}

Julia code:

import BenchmarkTools
import Libdl
mydict = Libdl.dlopen("./MyDict.so")
mydict_init = Libdl.dlsym(mydict, :dict_init)
mydict_destruct = Libdl.dlsym(mydict, :dict_destruct)
mydict_setindex = Libdl.dlsym(mydict, :dict_setindex)
mydict_getindex = Libdl.dlsym(mydict, :dict_getindex)
mydict_delete = Libdl.dlsym(mydict, :dict_delete)

function test_mydict(n::Int64)
    ccall(mydict_init, Cvoid, ())
    for i = 1:n
        ccall(mydict_setindex, Cvoid, (Int64, Int64), i, 5*i)
    end
    for i = 1:n-1
        ccall(mydict_delete, Cvoid, (Int64,), 5*i)
    end
    result = ccall(mydict_getindex, Int64, (Int64,), 5*n)
    ccall(mydict_destruct, Cvoid, ())
    return result
end

function test_julia_dict(n::Int64)
    m = Dict{Int, Int}()
    for i = 1:n
        m[5*i] = i
    end
    for i = 1:n-1
        delete!(m, 5*i)
    end
    return m[5*n]
end

BenchmarkTools.@btime test_julia_dict(10^7)

BenchmarkTools.@btime test_mydict(10^7)

P.S. I edited the code in the post to use 64-bit integers in both Julia and C++. The timing has not changed noticeably.

For some reason setting the dict entry in Julia allocates quite a bit, probably causing a slowdown:

julia> function s(n::Int64)
                  m = Dict{Int, Int}()
                  for i = 1:n
                      m[5*i] = i
                  end
       end
s (generic function with 1 method)

julia> @time s(1000)
  0.000043 seconds (18 allocations: 91.953 KiB)

julia> @time s(1000)
  0.000033 seconds (18 allocations: 91.953 KiB)

julia> @time s(1000)
  0.000033 seconds (18 allocations: 91.953 KiB)

julia> @time s(10000)
  0.000325 seconds (24 allocations: 364.156 KiB)

julia> @time s(10000)
  0.000305 seconds (24 allocations: 364.156 KiB)

julia> @time s(10000)
  0.000305 seconds (24 allocations: 364.156 KiB)

julia> @time s(100000)
  0.004186 seconds (36 allocations: 5.669 MiB)

julia> @time s(100000)
  0.004549 seconds (36 allocations: 5.669 MiB)

julia> @time s(100000)
  0.004584 seconds (36 allocations: 5.669 MiB)

julia> @time s(1000000)
  0.074060 seconds (54 allocations: 65.169 MiB, 13.09% gc time)

julia> @time s(1000000)
  0.068651 seconds (54 allocations: 65.169 MiB, 4.95% gc time)

julia> @time s(1000000)
  0.070122 seconds (54 allocations: 65.169 MiB, 0.70% gc time)

julia> @time s(10000000)
  0.978438 seconds (72 allocations: 541.170 MiB, 18.79% gc time)

julia> @time s(10000000)
  0.770858 seconds (72 allocations: 541.170 MiB, 1.11% gc time)

julia> @time s(10000000)
  0.764971 seconds (72 allocations: 541.170 MiB)

julia> @time s(100000000)
Killed

At the end it I even get an out-of-memory kill, I wonder what’s causing all those allocations? Strangely the number of allocs reported is quite small, but they really add up when increasing the number of dict entries set.

julia> versioninfo()
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, haswell)

By the way, you can do this in one command with g++ -fPIC -shared -O2 -o MyDict.so MyDict.cpp (and perhaps use -O3 -march=native instead of -O2 to possibly get more code optimization).

1 Like

See also

There is a 2022Apr4 patch in the new codebase.

If you didn’t test with the nightly builds, could you test with that as well?

On master the C++ code is about twice as fast

For smaller problem sizes like n=10^5, the Julia version is actually faster than the ccall version, so maybe it’s memory allocation slowing down Dict for large problem sizes as @paulmelis observed.