Indexing by names, current favorites in the package space

I loved AxisArrays.jl. However AxisArrays is old (designed pre1.0). It still works…ok, but I miss broadcasting and Not-.indexing. I did not learn about its newer counterparts until I had written a few weeks of analysis and got annoyed about missing functionality and namespace conflict of axes.

As those of you know, the ecosystem here is kind of a mess. (Too) many really nice, functional packages. All working as expected with greater or fewer bugs on the corner cases. Obligatory link to a detailed thread long thread on AxisArray replacements Has anyone waded through and come back with a strong opinion? Heres what Ive found:

  • AxisKeys.jl and parent NamedDims.jl (@mcabbot is a core contributor, and so inclined in this direction on the weight of the name. keyedarray[:, Key(“Person”)] is understandable for disambiguation but not as nice as keyedarray[:, “Person”] .)
  • DimensionalData.jl (@rafaqz is an amazingly energetic contributor with wonderful depth of knowledge on reducing abstraction cost to zero, very up to date. I avoided this one at first only because of the package name which makes me think meters/sec etc.)
  • (EDIT) NamedArrays.jl Old, but showing recent maintenance! Great name, use of pair syntax :Person => "sally instead of “=”, no bias toward continuum representation.
  • AxisIndices.jl (@Tokazama is super talented, but this out of date by several years. Im glad they are putting their ideas into ArrayInterface because I really like this style.)

SciML backed options:

  • LabeledArrays
  • ArrayInterface

Maybe not exactly in the same space, closer to the output of DifferentialEquations.jl?

Finally, I wonder if anyone has tips on the meta problem of finding the best packages. Currently I look at last updated src, stars, and “do I recognize the contributors from discourse”. I guess I can now add “search Juliahub rather than Google”… :see_no_evil:

Thanks!

1 Like

Same thread from 5 days ago:

:sweat_smile: how embarrasing. But looking at those links, what did you decide @nilshg? did you stick with NamedArrays or try AxisKeys or go with something else?

ArrayInterface is an interface, not a package for direct usage of this.

ComponentArrays is the one I’d recommend for most SciML types of things. It’s very different from the others mentioned above because its purpose is to be an abstraction over a Vector, so it’s good for running things like optimization on and doing linear algebra.

SymbolicIndexingInterface is a rather interesting thing to mention in here along with MTK codegen, which works in a wildly different way and its more about disconnecting interface from implementation so that optimizations done to mathematical models can be done without changing the user’s interactions with the solver.

Thanks for the reply! Very interesting! I think these directions are too specialized for me. Im more on the data-analysis side looking for the string multidimensional/ lightweight version of DataFrames or more mature version of AxisArrays.
e.g. getting genes count vectors

gene_templates[Patient = "MRP12367" , cell_type = "CD4 T",:]

After some testing and playing with the ergonomics, I am leaning toward AxisKeys.jl for what its worth to future users, I agree with most of the design decisions as a personal preference.

Im here to report back!

heres what I’ve found:

NamedDimArrays / AxisKeys.jl

conversion from aa::AxisArrays looks like for 2D data (Patient \ cell_type), looks very much like AxisArray construction

    freq_ka = KeyedArray(aa.data, 
	Patient = collect(AxisArrays.axisnames(aa,1)), 
	cell_type = collect(AxisArrays.axisnames(aa,2)));    

negation index works via function passing

contrained_freq_ka = freq_ka[:, !in(["Tumor","Follicle"])]

but it doesnt work perfectly since it’s data is a view of a NamedDimArray or something complicated. Later I seemed to accidentally read the whole array instead of copying out a slice and sizes became mismatched.

The two layers of wrapping AxisKeys - NamedDimArrays cause problems as mcabbott noted in the Zulip thread. Nice API goals, surprisingly complicated and buggy imo. (NamedDimArrays works perfectly on its own, but doesnt serve my purpose)

AxisIndices.jl

This doesnt work in a heavy environment, the compat restrictions have become too stale.

NamedArrays.jl

conversion from AxisArrays can be done with

named_array_convert(aa::AxisArrays.AxisArray) = NamedArray(aa.data; dimnames = AxisArrays.axisnames(aa), names = tuple(collect.(collect(AxisArrays.axes(aa)))...))

Look at cell frequencies for subpopulation

let na = freq_na[:,Not(["Tumor","Stroma"])]
	na ./ sum(na,dims = 2)
end

Works well! I’m not worried about potential type instability. I am worried about developer pulse, but its not zero! using, works fine julia 1.10.5 in a heavy pluto notebook. This is what I am now using.

DimensionalData.jl

Looks like a great package, but I cant get over the style. Very opinionated with use of X, Y and expectation of dimensional data

from the manual:

boxplot(rand(X('a':'d'), Y(2:5:20)))

(Conclusion) So Ive ended up going for NamedArrays, although I cant help but feel like it is still suboptimal. Its just super hard to cover the interface of Array. I think some version of traits / concrete-type subtyping / method forwarding / classes would help alot.

(Speculation) Maybe AxisArrays could be rebuilt on ReusePatterns.jl or one of the many interesting similar packages or ideally some future julia feature. I think what one needs is a ways to specify that if any function has methods for f(a::Array,…) the same method is automatically called f((a::AxisArray), …) = f(a.data,…) and only specified new methods would get higher precedence, annotated by function g(a::AxisArray,…). Lots of work has gone into trying to replicate this behavior, although the design space is tricky.

TLDL: NamedArrays.jl for now. (Edit: formating.)