Slow compiling of many AxisArrays

I’ve scratched my head on this for a few days and suspect there may not be a good solution but wanted the community’s input. We have a package that wraps a C-library MuJoCo and as added feature of using it through Julia, we allow the access of the underlying data vectors (of the C-structs mapped through Julia) with names / symbols with AxisArrays. We’re currently doing this by building a lists of the names and constructing AxisArrays for the different fields, which kills the compile time in recent Julia versions. Running the code below mimics what our wrapper does, with different data fields having different names as specified when the structs are loaded (different models loaded may have different names in the fields).

Although this allows for excellent performance at runtime of the functions that use these structs with named fields, it kills the latency. If you run the following code, each time foo is called you can see that it spends most of the time in compilation – I glanced at the AxisArrays code and saw a lot of @generated functions so suspect it’s non-trivial.

using AxisArrays, Random, StaticArrays

function foo(A, newaxes, axnames)
    a = map(newaxes, axnames) do newax, axname
        l = length(newax)
    l = map(length, newaxes)
    AxisArray(reshape(A, l), a)

function bar(d = 10, N = 100)
    A = zeros(d,d)
    names = Dict{Symbol,AxisArray}()
    @time for i=1:N
        r, c = rand(1:d, 2)
        s1 = randstring(r)
        s2 = randstring(c)
        s = (Tuple(Symbol(s) for s in s1), Tuple(Symbol(s) for s in s2))
        n = Tuple(Symbol(randstring()) for i=1:2)
        @time names[Symbol(s1)] = foo(A[1:r, 1:c], s, n)

  0.051647 seconds (64.98 k allocations: 4.154 MiB, 96.94% compilation time)
  0.043497 seconds (46.85 k allocations: 2.927 MiB, 95.59% compilation time)
  0.041870 seconds (46.85 k allocations: 2.922 MiB, 96.20% compilation time)
  0.038855 seconds (46.85 k allocations: 2.920 MiB, 96.32% compilation time)
  0.041749 seconds (46.85 k allocations: 2.921 MiB, 96.36% compilation time)
  0.053981 seconds (64.98 k allocations: 4.159 MiB, 96.84% compilation time)
  0.042786 seconds (46.85 k allocations: 2.921 MiB, 96.23% compilation time)
  0.040143 seconds (46.85 k allocations: 2.920 MiB, 96.01% compilation time)
  0.052432 seconds (64.98 k allocations: 4.161 MiB, 96.71% compilation time)
  0.040642 seconds (46.85 k allocations: 2.920 MiB, 96.11% compilation time)
  0.058924 seconds (64.98 k allocations: 4.170 MiB, 13.22% gc time, 97.32% compilation time)
  0.040128 seconds (46.85 k allocations: 2.920 MiB, 96.00% compilation time)
  0.040687 seconds (46.85 k allocations: 2.920 MiB, 96.02% compilation time)
  5.355541 seconds (6.11 M allocations: 389.572 MiB, 1.42% gc time, 96.92% compilation time)

If anyone knows how to avoid the latency of compiling while retaining the convenience of the named fields and the run-time performance, I would love to hear about it! I tried to implement a ‘lazy’ version of this with a Dict{Symbol,AxisArray} and generating the AxisArray if the field has not been used (i.e. missing from the dict), but that causes allocations even after the first run (the performance hit is may be acceptable, but the allocations are not).