Tiny associative array (alist-like)

So I need a data structure implementing a tiny associative array from integers to another type T (which type is not completely clear, but assume lists of integers, implemented as Vector{Int}, for now. Or it could be strings.) The number of (key, value) pairs will be very small (usually 2 and almost never more than 10), so using complex types with hashing (Dict) or trees (SortedDict) is likely not too efficient (I actually use Dict right now and a large part of my time is spent hashing keys).

So stdlib contains SparseArrays, which implements a SparseVector type using a list of keys and a list of values: this is space- and time-efficient and would be perfect for my usage. However, getindex(::SparseVector, non_existent_key) returns zero(T), which in my case is an error.

Of course, I could wrap Vector{Int} inside a struct ZeroableVector <: AbstractVector such that zero(ZeroableVector) returns [], but this seems a bit too ad-hoc; more precisely, since setindex!(::SparseVector, ...) calls iszero, I would also need to add a method to that function, which would conflict with the legitimate method inherited from AbstractVector.

So:

  1. is there a relatively standard structure implementing a tiny associative array? (e.g. a spiritual successor to GitHub - andyferris/AssociativeArray.jl ?) something like LISP’s alists?
  2. otherwise: did somebody already tweak SparseVector for something resembling this? (e.g. by overwriting get, adding a haskey method, etc.)?

I’m not aware of an existing implementation, but I think you could implement your own in a pretty efficient way. If you’re willing to pay the O(N) lookup cost, it could be as simple as:

struct SmallMap{K, V} <: AbstractDict{K, V}
  keys::Vector{K}
  vals::Vector{V}
  
  SmallMap{K, V}() where {K, V} = new{K, V}(K[], V[])
end

function Base.getindex(m::SmallMap, key)
  for (i, k) in enumerate(m.keys)
    if k == key
      return m.vals[i]
    end
  end
  throw(KeyError(key))
end

function Base.setindex!(m::SmallMap, value, key)
  for (i, k) in enumerate(m.keys)
    if k == key
      m.vals[i] = value
      return m
    end
  end
  push!(m.keys, key)
  push!(m.vals, value)
  return m
end

function Base.iterate(m::SmallMap, i=1)
  if i > length(m.keys)
    return nothing
  else
    return (m.keys[i] => m.vals[i], i + 1)
  end
end

Base.length(m::SmallMap) = length(m.keys)

Usage:

julia> m = SmallMap{String, Int}()
SmallMap{String, Int64}()

julia> m["hello"] = 1
1

julia> m["world"] = 2
2

julia> m
SmallMap{String, Int64} with 2 entries:
  "hello" => 1
  "world" => 2
3 Likes

You might try the ArrayDictionary from Dictionaries.jl, also by @andyferris.

There’s been some discussion of that package on Discourse e.g.

4 Likes

(O(N) is not much of a problem given my values of N indeed…)

Yes, this is more-or-less what I did (also I realized that writing the type was not much longer than my post here; I wrote almost the same as what you did, plus keys(), values() , haskey, a generic constructor, and that’s it.) Thanks for the answer!

1 Like

You could also sort key-value pairs, and use searchsortedfirst etc for a lookup, which is O(log(N)), plus, of course, the one-time cost of sorting — whether it is worth it depends on your application.