[ANN] DictArrays: dictionary-based arrays, or wide tables with the Julian interface

Dictionary-based arrays โ€” represent wide heterogeneous tables while enjoying familiar Julia collection and Tables interfaces.

Use DictArrays when you need a lean table type, but the compilation overhead of type-stable solutions (Vector{NamedTuple}, NamedTuple{Vector}, StructArray) is too much.

DictArrays are similar to StructArrays and have the same interface where possible, with the defining difference that DictArrays do not encode columns in the table type. This gets rid of the prohibitive compilation overhead for wide tables with 100s of columns or more.
Despite the inherent type instability, regular Julia data manipulation functions such as map and filter are fast for DictArrays: almost no overhead compared to StructArrays, orders of magnitude faster than plain Vectors of Dicts.

Compilation and runtime comparison

No compilation overhead:

# 1000 columns - almost instant
julia> da = @time DictArray(Dictionary(Symbol.(:a, 1:10^3), fill(1:1, 10^3)))
  0.001211 seconds (5.50 k allocations: 313.422 KiB)

# while StructArrays start to struggle:
julia> @time StructArray(da);
  7.496190 seconds (626.85 k allocations: 37.730 MiB, 0.30% gc time, 99.52% compilation time)

# DictArray compilation doesn't depend on the number of columns
# even absurd hundreds of thousands of columns are fine:
julia> @time DictArray(Dictionary(Symbol.(:a, 1:10^5), [fill(1:1, 2*10^4); fill([1.], 2*10^4); fill([:a], 2*10^4); fill(["a"], 2*10^4); fill([false], 2*10^4)]))
  0.228542 seconds (878.81 k allocations: 39.484 MiB, 11.63% gc time, 52.54% compilation time)

At the same time, map is as fast as for type-stable arrays:

julia> da = DictArray(a=1:10^6, b=collect(1.0:10^6), c=fill("hello", 10^6));

# DictArray
julia> @btime map(x -> x.a + x.b, $da)
  1.430 ms (300 allocations: 7.65 MiB)

# fast baseline: StructArray
# basically the same timings
julia> @btime map(x -> x.a + x.b, $(StructArray(da)))
  1.314 ms (2 allocations: 7.63 MiB)

# slow baseline: plain Vector of Dictionaries
# orders of magnitude slower, many allocations
julia> @btime map(x -> x.a + x.b, $(collect(da)))
  100.512 ms (1000022 allocations: 22.89 MiB)
6 Likes