In our application we have to allocate over 20K
dictionaries each having ~1M
keys. Each of them is created in a for loop and since they are only used within the iteration garbage collection temporarily freezes the code when memory fills.
Since we know the maximum number of keys the dictionary can have we now pre-allocate a dictionary and reuse it in the for loop. To know whether that value was set in the current iteration (and not a previous one) we use Tuple{Int16, Int64}
as a key where Int16
will be the iteration ID. Simplified it looks like this:
function reuse_dict()
# Prealloc dict
d = Dict{Int64, Tuple{Int16, Int64}}()
# Some numbers to test with
numbers = rand(1:10_000, 2_000_000)
sizehint!(d, 2_000_000)
# Track something so compiler knows we
# do something
found = 0
# fill it
for i in 1:100
# view just to make sure it's changing each iteration so it won't get outcompiled
s = rand(1:100)
e = rand(1_500_000:2_000_000)
v = view(numbers, s:e) # just to change each iter
for (j, numb) in enumerate(v)
d[numb] = (i, j) # i = origin iter
end
# Call some f(), just to not get outcompiled
if get(d, 1, 0)[1] == i
found +=1
end
end
return found
end
This requires me to use a Tuple{Int16, Int64}
, while not that much more than just Int64
it made me curious if there is some other smart trick to reuse allocated dictionaries?