I am pleased to announce the recent release of Dictionaries.jl, which provides an alternative interface for dictionaries in Julia, for improved productivity and performance. You can install it with ]add Dictionaries
at the REPL.
The high-level goal of this package is to define a new interface for dictionary and set structures which is convenient and efficient for functional data manipulation - including operations such as non-scalar indexing, broadcasting, mapping, filtering, reducing, grouping, and so-on. While Julia comes with built-in AbstractDict
and AbstractSet
supertypes, the interfaces for these are not as well established or generic as for AbstractArray
, the built-in dictionaries implement less of the common data manipulation operations compared to arrays, and it is difficult to work with them in a performant manner.
In this package we aim to devise a cohesive interface for abstract dictionaries (or associative maps), having the common supertype AbstractDictionary
. A large part of this is working with indices (of arbitrary type) as well as convenient and efficient iteration of the containers. A second goal is to make dictionary manipulation more closely resemble array manipulation, to make it easier for users. Simultaneously, we are pushing the performance of working with dictionaries to be closer to that of working with arrays.
The largest departure from Base.AbstractDict
that users should be aware of is that dictionaries here iterate values, not key-value pairs. In return, you get the convenience of using your usual tools for arrays, and some performance gains.
For a very small taste of how the Dictionaries.jl interface can lead to performance improvements during an analytic workflow, consider the following example where we attempt to add 1
to the values of a hash-based dictionary. Not only is this easier for the programmer (using broadcast notation) but is also much more performant than the typical pattern with Base.Dict
. Of course, map
, reduce
, filter
and friends all work out-of-the-box, too.
julia> using Dictionaries, BenchmarkTools
julia> dictionary = HashDictionary(rand(Int64, 10_000_000), rand(Int64, 10_000_000));
julia> @btime dictionary .+ 1;
135.975 ms (22 allocations: 256.00 MiB)
julia> dict = Dict(rand(Int64) => rand(Int64) for _ in 1:10_000_000);
julia> @btime Dict(k => v+1 for (k,v) in dict);
10.283 s (73 allocations: 541.17 MiB)
(Note - the thing impacting performance most here is that the Dict
is constructed by inserting the elements and iteratively building a new hashmap, while broadcasting over the HashDictionary
preserves the keys and hashmap, creating only new values in the valid slots).
This is a young package, and contributions are of course very welcome. I am currently working on integrating this with SplitApplyCombine.jl, particularly such that the group
family of functions return a Dictionaries.jl dictionary.
For Julia Base
developers, this package combines the ideas discussed during Julia 1.0 development, particulary the Juleps #24019 and #25904, the (unmerged) PRs #24508 and #25013, the non-scalar indexing proposal #30845 , the discussion of dictionary iteration in #22194 and a variety of other work which was merged at that time.