Poor time performance on Dict?


#1

So I conducted a very simple run-time test for both Array and Dict. And here are the results of the very simple test:

image

Now, it was clear to me before the test that the allocated Array was going to be much faster. But the time it took Dict seemed a bit off. So I went to Python to carry on a similar simple run-time test.

image

Also as expected, the Numpy array performed faster than the dictionary. However, the dictionary performed faster in Python than in Julia.

Now the initial intent was certainly not to come out here and ask how come this long Python loop involving Python’s dictionary is performing better than Julia’s loop involving Julia’s dictionary.

I did expect Julia to be considerably faster iterating over both loops. Could this be a performance issue with Julia’s Dict?

Thank you.

Pedro


#2

Please add code snippets between three back-tics, ``` and ```, instead of posting images.


#3

The Dict loop is basically only timing the performance of the dictionary implementation, which for Python is written in C and heavily optimized. It is not too surprising that Python has a faster dict implementation considering how much dictionaries are used in Python and the maturity of the language.


#4

On a smaller set of values (n=500000), using get!(x,i,i) looked about twice as fast as x[i] = i, and I think dicts may outperform arrays when searching for values, but they do seem to slow down when they get big enough. Maybe indexing related?


#5

Would working with a huge database of many small size Dict be as significantly slow? I mean looking up a couple values (lets say 2-5 values) in a Dict with maybe 15 elements, but you have to iterate through very many of these Dict instances in an Array. Need to test how that affects performance too.


#6

But I think it’s also worth mentioning that this is not a fair comparison between Dict’s performance (written in C for Python) because of the size of the loop in Python. We shouldn’t only take the actual run-time of the Dictionary loop (for whatever that is worth) in consideration but also the run-time for the Numpy array loop (also written in highly optimized C code). I mean, it’s not that Julia arrays are ~26 times faster than Numpy, it’s because of the actual Python loop.

If you put Numpy arrays and Julia arrays on an equal footing, then Julia Dict’s performance worsens yet another huge factor.

So we have Python’s Dict at ~10/N seconds/iteration and Julia’s Dict at ~(10 to 20)*26/N seconds/iteration.

Obviously this is all very rough and too simple a test, but the differences are very large.


#7

Probably, because memory access would be the bottleneck for a “huge” database (for a given value of huge). Dict are mutable, and take up more memory than arrays or (named) tuples (when organized efficiently, not duplicating type information).


#8

No, it is because there is a cost in calling into the numpy C-library over and over while Julia can inline functions.


#9

I cannot replicate this on master with the following code. The two functions perform very similarly.

function dict_performance()
    n = 100_000_000
    x = Dict{Int,Int}()
    sizehint!(x, n)
    for i in 1:n
        x[i] = i
    end
end

function dict_performance2()
    n = 100_000_000
    x = Dict{Int,Int}()
    sizehint!(x, n)
    for i in 1:n
        get!(x, i, i)
    end
end

using BenchmarkTools
@btime dict_performance()
@btime dict_performance2()

OTOH, it seems that the total time/number of iterations ratio gets worse as the number of unique entries gets larger. More systematic benchmarking would be needed to find out whether that’s a real problem or not.

@phrmoy Does storing 100M unique values correspond to your actual use case? It could be that Julia performs as well as Python for smaller sizes.


#10

Coming from Python, I am a heavy user of dictionaries and use cases may vary. But size perspective is relative, this comfortably fits within a fraction of the memory of my running machine so I see it as a fair use case. All that said, obviously I am loving Julia and that’s why I am here reporting;)


#11

On my machine for 10^7 size using Python 3 and Julia 0.7 I get that Julia is faster (10^8 starts swapping to disk as I do not have enough RAM to test, but also Julia is faster taking swapping into account). It seems that you are using Python 2?

Also note that you are using linear keys. In python hash(i)=i so it is very cheap but avoids collisions only for the case you have specified. In Julia hashing is more involved - in this case the core part is:

function hash_64_64(n::UInt64)
    a::UInt64 = n
    a = ~a + a << 21
    a =  a ⊻ a >> 24
    a =  a + a << 3 + a << 8
    a =  a ⊻ a >> 14
    a =  a + a << 2 + a << 4
    a =  a ⊻ a >> 28
    a =  a + a << 31
    return a
end

which obviously must be more expensive but should be more robust.


#12

Using Python 2.7.14 and Julia 0.6, thanks for sharing that.


#13

Funny, i tested again and they were similar too, no restart of julia either
It might pay to hint a bit larger (3n or so?) just to avoid rehashing / birthday collisions


#14

You can easily use hash(i)=i in Julia as well by defining a wrapper type for the keys, and when I do so I find that performance in the original benchmark is increased by almost a factor of 5, which makes it almost 3x faster than Python for me (in Julia 0.6):

julia> struct FastHashInt{T<:Integer}; i::T; end

julia> Base.:(==)(x::FastHashInt, y::FastHashInt) = x.i == y.i
       Base.hash(x::FastHashInt, h::UInt) = xor(UInt(x.i), h)

julia> function dict_perf()
           n = 10^7
           x = Dict{Int,Int}()
           sizehint!(x, n)
           for i = 1:n
               x[i] = i
           end
           return x
       end
dict_perf (generic function with 1 method)

julia> @time dict_perf(); @time dict_perf(); @time dict_perf();
  1.754449 seconds (1.48 k allocations: 272.081 MiB, 4.54% gc time)
  1.756465 seconds (11 allocations: 272.001 MiB, 4.38% gc time)
  1.715037 seconds (11 allocations: 272.001 MiB, 1.10% gc time)

julia> function dict_perf2()
           n = 10^7
           x = Dict{FastHashInt{Int},Int}()
           sizehint!(x, n)
           for i = 1:n
               x[FastHashInt(i)] = i
           end
           return x
       end
dict_perf2 (generic function with 1 method)

julia> @time dict_perf2(); @time dict_perf2(); @time dict_perf2();
  0.376183 seconds (1.37 k allocations: 272.073 MiB, 5.45% gc time)
  0.355044 seconds (11 allocations: 272.001 MiB, 3.26% gc time)
  0.350325 seconds (11 allocations: 272.001 MiB, 2.91% gc time)

The Python 2 version of this takes 1 second on my machine:

In [1]: def dict_performance():
    dic = dict()
    for i in xrange(10000000):
        dic[i] = i

In [2]: %time dict_performance();
CPU times: user 873 ms, sys: 188 ms, total: 1.06 s
Wall time: 1.06 s

(1.13 seconds in Python 3.) Update: In Julia 0.7, I get an additional factor of 2 speedup (0.18sec for dict_perf2).

A moral of this story is that in cases where dictionary performance is critical, you might want a custom hashing function optimized for your use-case. The good news is that this is possible in pure Julia code.


#15

I really don’t think this is about whether it has been optimized in C or not, this is really more an issue of the data structures and algorithms used for Dict in Julia.
Dict does not have an O(1) insertion time, because it has to find a free slot, if there are collisions, so it slows down drastically as the Dict gets full.
I ended up having to write replacements for Dict because of these issues.
My preferred hash table structure does the following: uses 32-bit CRC for the hashes for strings, and caches the hashes (so that even if there are only a few elements, you can avoid a potentially costly isequal check), uses a vector of “next offset” for the collision chain, which can be quite large (and only needs to be UInt8 for dicts with less than < 256 elements, or UInt16 for < 65536, etc.), and a single vector of elements, where they are simply added to the end.


#16

I used your hash function above for 10^8, and I got a ~4x speed up, so the run-time dropped from ~20 seconds to ~5 seconds. For the same size, Python’s run is around ~10 seconds. It’s tempting to say that we are twice as fast as Python (on the loop run of 10^8, we are). However, if you take into account the looping in Python (you can use the Numpy array loop run as a normalization), Julia’s Dict implementation is still much slower than Python’s dict. The reason to take into account the looping is because I am not interested in benchmarking Python’s looping, in fact, I am not interested in comparing Julia’s performance to Python’s performance either. I am only interested in identifying if there is something off about the implementation of Julia’s Dict and I used Python’s dict just as a reference point. If you take the loop normalization into account, Julia’s Dict on a per iteration basis, still seems about 10-15 times slower than Python’s dict.


#17

This seems a little sketchy to me. The problem with Python is not just “looping” per se, but that there is an overhead to every single call. It’s not very fair to say that “Python’s dictionaries are faster as long as you don’t access them via Python.”

That being said, it might be interesting to benchmark the Python dictionary performance in C, via the CPython API.


#18

Python has its own set of issues. And that is ok, I am not here to address them. But if you normalize the runs, JR = (Julia’s Dict Run / Julia’s Array Run) vs PR = (Python’s Dict Run / Python’s Array Run), you will see what I am saying, that JR >> PR, and there is nothing sketchy about it, I am only trying to isolate performances.

EDIT:
image

EDIT2:
This normalization holds as long as you believe Numpy array are roughly equivalent in efficiency as Julia array’s for the example given. Thus the normalization roughly puts both languages on “even footing,” all else equal, leaves us with a comparison between the implementation performances of Julia’s Dict vs Python’s dict for the example given.


#19

You are still assuming that setting the index in the numpy array costs the same as for julia. That is not true at all.


#20

No, your “normalization” procedure is flawed. This is not the right way to separate the Python dictionary overhead from the Python interpreter overhead.

The right way is to call libpython directly from C, as I mentioned above. When I do that, the Python 2.7 time decreases from 1.06 seconds (for my benchmark above in pure Python) to 0.84 seconds. So, the CPython interpreter overhead in this example is rather small (only about 25%) — it really is benchmarking mainly the dictionary performance (+ the intrinsic overhead of working with Python PyObject data structures), and the Julia implementation with optimized integer hashing really is several times faster.

My C code follows:

#include <Python.h>
#include <stddef.h>
#include <sys/time.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
     Py_SetProgramName(argv[0]);
     Py_Initialize();

     PyObject *d = PyDict_New();

     struct timeval start, end;
     gettimeofday(&start, NULL);
     size_t i;
     for (i = 0; i < (size_t)10000000; i++) {
          PyObject *io = PyInt_FromSize_t(i);
          PyDict_SetItem(d, io, io);
          Py_DECREF(io);
     }
     gettimeofday(&end, NULL);
     printf("elapsed time = %g seconds\n",
            end.tv_sec-start.tv_sec + 1e-6*(end.tv_usec-start.tv_usec));
     
     Py_Finalize();
     return 0;
}