Cache Data Structure for Lazy Array

AlexanderNenninger · August 2, 2023, 10:12am

I’ve got an implementation of a lazy array (lazyarrays.jl · GitHub), the entries of which are computed on the fly by function calls. From what we can tell it’s already quite fast. Although many algorithms do access the same entries of the array multiple times, leading to unnecessary calculations.

What could be good data structures, that are already implemented in Julia, to cache the evaluation results? Ideally they would be sparse to allow for larger-than-memory arrays.

I think criteria would be

fast membership test with keys of primitive type (tuples of ints)
fast lookup
fast insertion

Elements do not need to be deleted and are immutable. We also control the key set, obviously.

stevengj · August 2, 2023, 11:51am

People typically use Dict for this sort of thing, e.g. in Memoize.jl. You can define a custom key type with an optimized hash function to speed things up if you know more about your keys.

aplavin · August 4, 2023, 5:37am

Just in case, are you aware of existing packages implementing this functionality?
For example, mapview() from FlexiMaps — like map() but lazy:

julia> @btime Matrix(LazyFunctionArray((x, y) -> x + y, 1000, 1000));
  1.003 ms (3 allocations: 7.63 MiB)

julia> using FlexiMaps

julia> @btime Matrix(mapview(I -> I[1] + I[2], CartesianIndices((1000, 1000))));
  995.376 μs (2 allocations: 7.63 MiB)

julia> LazyFunctionArray((x, y) -> x + y, 1000, 1000) == mapview(I -> I[1] + I[2], CartesianIndices((1000, 1000)))
true

Not sure if any provide caching though.

AlexanderNenninger · August 7, 2023, 7:14am

Thanks for the tip. The function itself can provide the caching.

aplavin · August 7, 2023, 11:27am

It can, and existing memoization libraries would probably yield the same performance when combined with mapview compared to a specialized implementation using Dict underneath.

What’s definitely possible to make more efficient is the scenario where one accesses a significant fraction of the resulting array (multiple times). Then, using a same-sized array as a cache would have less overhead compared to Dict, and this would benefit from a specialized implementation.
Such functionality is potentially in the scope of FlexiMaps.jl as well, but don’t know if that’s a common usecase — don’t remember a need for such a structure myself.

AlexanderNenninger · August 8, 2023, 9:28am

I thought about doing that and was a little stuck on how to represent missing values efficiently. For developing algorithms the cacheless version is pretty good due to it’s simplicity and fast intialization. I feel like we’re at a point where we need empirical data to proceed further. I’ll update this post once I get to that point. Thanks for all the ideas and the feedback.

Topic		Replies	Views
Fastest way to do memoization? Performance performance	11	670	February 23, 2023
Use of Lazy arrays Performance lazy-evaluation	3	1281	April 21, 2022
Tiny associative array (alist-like) New to Julia data_structures	4	842	March 26, 2021
Hash Collision in Memoization Performance	10	441	November 20, 2023
Performance of array of arrays vs. dictionaries when accessing and manipulating data structures Performance question	5	2342	March 14, 2022

Cache Data Structure for Lazy Array

Related topics