Using the Caching.jl and LRUCache.jl packages and reading Tom Kwong’s book on the memorization pattern, I noticed an edge case that is never considered. No matter how big, a hash can be repeated for different data.
Granted, it’s almost impossible with a 64-bit hash, but it might be an issue for functions that benefit even with the overhead of this pattern.
I know that a dict in Julia is implemented as an hashmap, and it might be able to handle collision, however if the input is an hash itself, two identical hash would lead to the same value.
Is there another way to serialize safely the information instead of hashing that would make the key for sure unique for the dictionary?
Otherwise the other option is to store all the tuple (Input,Output) for every hash and check if
- the hashed key exist
- the input key and the dictionary key(s) are the same
If they are the same, just retrieve the value, otherwise, add a new value to the entry
hkey=hash(key) if haskey(dict,hkey) && sum(key.==getindex.(dict[key],1)) v=dict[hkey] return findfirst(x -> x == a1, arr) else dict[hkey]= vcat(dict[hkey],(key,val))
I understand that it is an uncommon occurrence, but if it were to happen, it would have the potential to disrupt an entire simulation that relied on that pattern for efficiency. In my opinion, chaining the output could be a more effective solution, even if it requires additional memory.
Sometimes, a key containing multiple values is removed from the dictionary using LRU. However, in this case, it means that the value computed by all the entries was almost never used. Therefore, redoing the computation wouldn’t be such a hassle.
Another way would be using an efficient data structure in Julia that is able to address this problem automatically