Memoization performance

Consider the classic fibonacci example:

fib = n -> n < 3 ? 1 : fib(n-1) + fib(n-2)

I have benchmarked several memoization implementations. The result is somewhat surprising. In theory, they work almost the same way. I wonder how it ends up with such a big difference. See this gist for details.

# How Performance
1 Custom caching code 34 ns (generic)
2 Custom memoize closure 51 ns (anonymous)
3 Memoize.jl 223 ns (generic)
4 Caching.jl 1038 ns (generic), 1057 ns (anonymous)

What are you actually trying to measure? @btime prints the minimum time over all iterations, and since the cache will contain the result for fib(40) at some point, you’re really just benchmarking cache lookup. Is that really what you’re interested in?

BTW, for your custom caching code the cache is a Dict{Any, Any}, and you’re basically doubling the hash computation cost by first doing a haskey and then a getindex. You could probably speed things up by using a Dict{Int, Int} and using get!. But if you’re going to go to the trouble of customizing for this particular application, why not use a Vector{Int} as the cache?


This is a very good point. But if that’s the case then all of these options should be almost identical. I think Memoize.jl uses An IdDict. Maybe the overhead came from the construction of the key.

It is just a toy example. I am just interested in the best way to do memoization. I’m in the process of writing a book, and this is one of the subject areas.


Yes, by default, but you can also specify the type of the cache.

1 Like

In the case of Caching.jl, a seemingly ‘long’ time is spent on hashing the input arguments. You can check out (if interested) the hashing function here
For example:

@btime Caching.arghash(40)
  283.898 ns (4 allocations: 80 bytes)

As a speed reference, on my machine,

@cache fib = n -> n<3 ? 1 : fib(n-1) + fib(n-2)
@btime fib(40)
  701.531 ns (17 allocations: 688 bytes)

Since it is being defined recusively, some time is spent on checking and inserting new values i.e.

julia> fib
fib (cache with 40 entries, 40 in memory 0 on disk)

I do have to say that nanosecond timings are not that relevant for the type of operations memoization is usually used for (i.e. operations that take orders of magnitude more time to complete i.e. ms/s for the increase in memory to be warranted).

In essence, all memoizers calculate hashes and lookup that hash in a dictionary however the type of hashing and checks performed have a slight performance penalty.

For small functions and very high speed i.e. ns-order, a custom memoizer is a much better choice.


StaticNumbers can also be used for compile-time “memoization” of numeric functions (using Julia’s dispatch table as the cache). With the fib function from that package’s readme, I get:

julia> @btime fib(40)
  0.020 ns (0 allocations: 0 bytes)

This is cheating of course, since all of the computation takes place at compile time. As @tkoolen already said: @btime is probably not the most interesting benchmark for memoized functions…

Edit: I should probably mention that StaticNumbers is not intended for memoization. It just happens to work like that in this specific case because the compiler is able to do all the simplification.


Thank You for these benchmarks for Memoization performance ,

I understand your benchmarks and book here might have been before LRUCache
but the thread safe ( aka no race conditions ) use case is very important when you want reliable and predictable results like so "*A particular use case of this package is to implement function memoization for functions that can simultaneously be called from different threads."

Maybe time to update the website and/or revise book ?


Ps> Maybe “There is nothing new under the sun ?” so I can’t help noticing this is recapitulating the design for Java ConcurrentHashMap and table lookups as described here >>

1 Like