In Julia there are many AxisArrays.jl like packages, that implement variants of an array with metadata attached to its axes. This means every once in a while, one has to convert arrays between these.
After doing lots of ad hoc conversions I wish to have the conversions once and for all in a central place.
Is there already a package, that allows to convert between a collection of AxisArrays like packages?
AxisKeys’s wrapdims function can import some of them, but is a bit less elegant — it just looks at the fieldnames(typeof(A)) in order to guess, without depending on other packages. But only one way! (Looks like it’s NamedArrays.jl & AxisArrays.jl that it handles.)
It’s a pity we are so fragmented here. Would be nice to eventually figure out how to kill all but one of these packages.
To give context to this comment, I recently tried to answer a related question on StackOverflow and edited my answer to include a DimensionalData.jl example solution, but I’m definitely not the expert here and I found it hard to chose DimensionalData.jl. Importantly, I’d like to cite a comment from the SO question author to drive the point (emphasis mine):
In other words, this testimony suggests that the current fragmentation is hurting the userbase and is probably reducing important feedback. Having a welcoming place to guide new users and help them chose a package would be most welcome.
I probably don’t have the solution you’re hoping for, but I believe there is a long term solution for this, ArrayInterface.jl. I was a part of a lot of these “AxisArray” discussions. However, while developing AxisIndices.jl I realized that we could do even better than an xarray imitator. We can have static arrays, offset arrays, padded arrays, arrays with fancy keys, etc. and it can all be interchangeable and compatible. It involves reimplementing some of the indexing interface for arrays (mostly tweaking entry points b/c it’s already pretty great), and defining a lot of traits from collaborative discussion.
It’s not ready to replace all these other related packages, but we’re trying to support performance for packages like LoopVectorization. Hopefully it won’t be too long before people regularly utilize the traits from ArrayInterface.jl and then packages can define their own “static” types that take advantage of them. If people began adopting ArrayInterface.jl it wouldn’t even matter which “AxisArray” package you use because the user facing interface should be basically the same.
If I understand correctly, this is more like an extension of the Base AbstractArray interface, where what the OP seems to be looking for is something like Tables.jl for ND arrays with labels for any/all axes. Is that right?
Or is the point that, if all of the current AxisArray-like types implemented this interface, we’d get that interoperability?
I had the same problem as the OP and created GitHub - JuliaDataCubes/YAXArrayBase.jl which also contains an interface for different AxisArray-like packages. It does not depend on any of the packages but implements the interface through Requires.jl, so that the extensions are only loaded on package load time. Would be good to merge efforts here…
Any time I’ve come up with a solution to something for arrays in AxisIndices and felt it was truly generic and agreeable, I moved it to ArrayInterface.jl.
I think we did a lot of that when working on strides for LoopVectorization too.
This is probably similar to how Tables.jl was developed (e.g., using lessons learned from DataFrames.jl).
But it’s hard to say exactly how this compares to something like Tables.jl
There are a lot of problems you have to solve when creating something that does what AxisArray does.
ArrayInterface.jl solves problems for a lot of things, and I’m not sure exactly how things will look in a year.
I look at all the “AxisArray” imitators out there and admit there could be a lot of consolidation.
But if you give people something like the AbstractArray interface people can come up with a lot of cool things.
For users this usually means you just have a more consistent interface (think how similar it is to use something from StaticArrays compared to an Array).
For developers this means that you put a little extra time on the front end to make sure you work with this interface and then you don’t spend time regularly patching up issues arising from your homegrown solution.
Thanks for recommendimg DimensionalData.jl, but absolutely there are contexts where NamedDims.jl, AxisKeys.jl, and others are better.
Its hard to resolve what to do with these packages as they do have really different use cases. DD is mostly used for spatial data, other packes are used for ML, images and other things I dont know. But for example DD has plot recipes that permute automatically based on the order ofx/y/z/time dims, and a lot of indexing/selector traits to work with points and intervals with various properties.
For other examples: two PRs in progress are adding explicit bounds vectors for interval keys like netcdf has; and circular axes, mostly for climatic datasets and longitudes. I doubt anyone doing ML wants those much, and I didnt know I did previously… so the scope of functionality crossover really isnt settled yet.
But I agree with @Zach_Christensen that pushing common methods to ArrayInterface.jl is the best way to get at the problem without the hard work of actually merging package functionality.
It would be good to have ArrayInterface.jl in JuliaArrays, I dont currently feel I can just add methods to it, not knowing the use cases in SciML. But I am intending to support it in DD when I have time.
Tim Holy has made that suggestion several times. I think it makes sense but Chris Rackauckas needs to be the one who ultimately makes that decision. I probably nag people with my incessant PRs and comments more than anyone, but he created the package.
I’ve recently started trying to put together some more documentation and examples, but we haven’t extensively discussed a contributors guide yet.
I’ve also discussed this with Tim. We need the move both for this discussion and for generalising JuliaImages packages for other uses, which is also relevent here as a lot of the traits are about axis properties.
I dont think we can have these whole ecosystems depending on a key package that is in SciML and seems to depend a lot on one person (nothing against Chris and all his amazing work), and isnt clear on contribution.
Moving it to JuliaArrays is a clear statement about its direction. I dont see more general uptake without that, or a solution to the problems discussed here.
SciML has about double the contributors of JuliaArrays so that argument doesn’t make a lot of sense. Even if you took me out of the picture SciML would still have more activity in contributing and maintaining the libraries. I do a lot, but you should not underestimate the rest of the devs who are also some of the most productive devs in Julia! But sure, if this is actually about bike shedding some naming, we can move it and get some of the SciML devs the right ownership access to keep it maintained.
The reason it is in SciML is because it’s used in SciML, it was/is maintained by people in SciML, and repos in SciML tend to be maintained. A lot of libraries in JuliaArrays violate good Julia practices like continuous release that we try to enforce in SciML via COLPRAC. But what might be better would be to get more orgs onto COLPRAC and adopting these practices.
I’m not sure what difference it makes in terms of people contributing to the package. I’m not a member of SciML but I got involved because I thought I could make meaningful contributions to ArrayInterface.jl. TBH, I’m not sure why people keep suggesting moving it to JuliaArrays but I can see how it makes sense logically because…well…arrays.
But sure, if this is actually about bike shedding some naming, we can move it and get some of the SciML devs the right ownership access to keep it maintained.
I agree with that perspective: this is a bikeshed. But it’s one I’ve grudgingly admitted is important. I’ve long been baffled about why some people are so unwilling to use the multitudinous array algorithms in JuliaImages just because of the *Images in the package names. But it’s been confirmed over and over again, with more requests than I can count to “Please just move this into a package not named Images so I can use it.” Which to me makes no sense, but there you have it.
So the plan is to split a fair amount of the functionality in JuliaImages packages out to JuliaArrays. Portions of JuliaImages will essentially be reduced to the “now we can specialize those methods for colors!!!” organization, which is pretty trivial. But people really seem to want that to happen, so we’re going to do it.
To do that we need a certain collection of array traits to make this whole thing work together. If I have my history right, @Zach_Christensen (who has been doing more than anyone to make the generalization of needed traits a reality) “came from” the JuliaImages world. That fact that he’s been doing his traits work starting from a package that was created to serve the needs of SciML is awesome, because it helps ensure that the results of his work will meet the needs of a diverse community.
But just as folks are reluctant to rely on packages that happen to live in JuliaImages, I think it makes sense to ask why the same thing shouldn’t apply to other organizations. Code, it seems, should live in the place where a naive user would be most likely to find it, and when we worry about who started it where we just risk putting up unnecessary barriers to discovery. For something focused on array traits, the answer is obviously JuliaArrays.
As you say, it’s critically important to set the key contributors up with all the administrative privileges they need to continue their work.
I tend to not care too much about bikeshedding discussions… so just get me the right privileges and I’ll move it over if someone else fixes the url in General. It’s extra work so if you want it then put in some work .
Thanks for agreeing to the move. To be clear SciML is an amazing organisation with great coding standards, I wasn’t suggesting otherwise. But I agree with Tim that (important) code should be where the naive user expects to find it, and that’s JuliaArrays in this case. That kind of coherence will mean a lot in the long term to the ecosystem. Edit: also I agree entirely on spreading COLPRAC being the best outcome here, ArrayInterface.jl could continue to use it in JuliaArrays?
Also to further clarify, this is only an issue to me because ArrayInterface.jl is being proposed as a fundamental dependency for arrays accross multiple domains, which makes these issues more worthy of some bikeshedding than otherwise.
@tim.holy for EcoJulia the problem with the Image- name was mostly all the JuliaImages and Colors deps. But names, ownership and acceptible dependencies seem to get bundled together, so its not a completely separate issue.