Can we reuse Dictionary allocation?

In our application we have to allocate over 20K dictionaries each having ~1M keys. Each of them is created in a for loop and since they are only used within the iteration garbage collection temporarily freezes the code when memory fills.

Since we know the maximum number of keys the dictionary can have we now pre-allocate a dictionary and reuse it in the for loop. To know whether that value was set in the current iteration (and not a previous one) we use Tuple{Int16, Int64} as a key where Int16 will be the iteration ID. Simplified it looks like this:

function reuse_dict()
    # Prealloc dict
    d = Dict{Int64, Tuple{Int16, Int64}}()
    # Some numbers to test with
    numbers = rand(1:10_000, 2_000_000)
    sizehint!(d, 2_000_000)
    # Track something so compiler knows we
    # do something
    found = 0
    # fill it
    for i in 1:100
        # view just to make sure it's changing each iteration so it won't get outcompiled
        s = rand(1:100)
        e = rand(1_500_000:2_000_000)
        v = view(numbers, s:e) # just to change each iter
        for (j, numb) in enumerate(v)
            d[numb] = (i, j) # i = origin iter
        # Call some f(), just to not get outcompiled
        if get(d, 1, 0)[1] == i
            found +=1
    return found

This requires me to use a Tuple{Int16, Int64}, while not that much more than just Int64 it made me curious if there is some other smart trick to reuse allocated dictionaries?

I think you could just empty! it at the end of each loop iteration. My understanding is that it won’t shrink the internal vectors. In fact, there’s no way to shrink the internal vectors until julia 1.9, when sizehint! will be able to shrink them, thanks to


This works, was curious if reusing (without empty!) would help in hashing the keys but that does not seem to be the case (similar execution time and alloc graph). Thanks for this, this is much more elegant :slight_smile:

1 Like