That seems pretty damn fast, I’m really doubtful that there are any implementations out there much faster than that except for possibly object ID dicts (which are available in Julia, by the way). This is also on a rather slow laptop CPU. Granted this doesn’t necessarily guarantee that every object has efficient hashing, perhaps there are some common objects which have poor hashing?
It really depends on what you hash - there’s different implementations for different types. You can check them here and here. There may be other reasons beside performance for why each implementation was chosen - @StefanKarpinski may be able to shed some light on that, based on the git blame for those files.
I have used quite large Dicts and ObjectIdDicts and they were reasonably fast for the task (I have not benchmarked other hash tables to compare) with the exception of rehashing ObjectIdDict which takes hours when it reaches a certain size. sizehint! is your friend here if you know the approximate size of your Dict.
julia> A = rand(5);
julia> using BenchmarkTools
julia> @benchmark A[$1]
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 20.913 ns (0.00% GC)
median time: 21.449 ns (0.00% GC)
mean time: 28.138 ns (19.46% GC)
maximum time: 40.808 μs (99.93% GC)
--------------
samples: 10000
evals/sample: 996
I seem to remember at some point understanding the reason why that seems slow, but right now I can’t think of what that might be and this strikes me as slow. Perhaps I’m just wrong and that’s a reasonable number.
On the bright side, it seems like hashing alone is really damn fast.
Cool. Can somebody please explain what interpolated means in this context. I only know about numerical interpolation and string interpolation in Julia. I am guessing it has to do with scope and local variables vs global variables.
Further, what’s the difference between @btime and normal @time in this context.
Also, how can I check what hashing algorithm that is used by default for different types.
@time literally just puts a timer around your code, so you might be including times other than the execution of your code like, especially, compile time. You’ve probably noticed that @time tends to be much faster after you run it on the same function a couple of times. @btime ensures that it only measures the actual execution of code.
In this context and interpolated value means a value that was computed before it goes into benchmarking. For example @btime x[$(rand(1:3))] executes rand before benchmarking, stores the value and uses that value as input to the benchmark.
Also, the @edit macro is extremely useful. Try, for example @edit rand().
In this context its expression interpolation. Its a metaprogramming concept. Here $A basically says “instead of using a reference to variable A, just replace it with its value”.
Yes, and in particular it is used by @btime to allow you to benchmark expressions involving global variables without paying the usual performance price in Julia for working with globals.
Without interpolation, an expression like @btime dict[3] is including the time to look up the type of dict and determine (dynamically) the correct getindex method to call for getindex(dict, 3) (= dict[3]), because dict is a non-constant global variable. This is typically not what you want, because real performance-sensitive code in Julia will normally occur in functions and act on local variables whose types are known to the compiler. By doing @btime $dict[3], it interpolates the value of dict into the expression, and allows the @btime macro to time it with the type known at compile time.
Check the links to github I posted before, you should find the implementations there. As far as I’m aware, there’s no formal description of which algorithm is used precisely and why.