Maybe this can serve as an example of why merging packages that appear “similar” is actually hard.
DimensionalData.jl is complicated so that it can represent netcdf, geotiff and similar objects in all (actually just most) of their complexity. AxisKeys.jl can’t represent these, and doesn’t need to at all for its use cases.
Personally, I do not push that DimensionalData.jl should be the main axis array package. It’s clearly over-engineered for the simple case, and still contains a lot of experiments (although axiskeys does too). It’s maybe only slightly over-engineered for the complicated cases.
For example, a netcdf file can have an irregularly spaced lookup index where the bounds of each pixel along each axis are explicitly specified. I have to represent that exactly, and be able to write it back to disk unchanged from the file it came from. This has to work through spatial subsetting, broadcasts, rotation, whatever. I also need to track the spatial bounds of the object, and that is specifically not the first and last value of the lookup most of the time.
None of the other packages can do these things, because their concept of what an axis/dim is is too simple. The flexibility of DimensionalData.jl also means that ArviZ.jl can quite easily wrap a python xarray stack.
But - a key point is - the other packages don’t need to do these things. They are completely usable and equally good for most other tasks, and a clearly simpler and better for a bunch of things.
@aplavin
mentioned exporting X
and Y
being weird. In the spatial sciences (not some tiny niche by the way ;), virtually all of our data has X
and Y
axes, occasionally + Z and time. So it’s worth having them exported, we are typing that all day. I almost never use other custom dimension names in DimensionalData.
And yes, the types are complicated, probably too much. Although you may do a little disservice by including the submodule scoping in your example… those types are not exported by default for a reason:
DimArray{Float64, 2, Tuple{X{Sampled{Int64, UnitRange{Int64}, ForwardOrdered, Regular{Int64}, Points, NoMetadata}}, Y{Sampled{Int64, UnitRange{Int64}, ForwardOrdered, Regular{Int64}, Points, NoMetadata}}}, Tuple{}, Matrix{Float64}, NoMetadata}
But there are definitely a few things there I want to remove, but not that many can be.
If you can represent an ordered or unordered categorical axis, an ordered sampled axis representing regular or irregular points or intervals - and may have explicit interval bounds, or implicit centered, to the left or right of the axis values (not to mention axes that need CoordinateTransformations.jl to calculate indices from selectors) you just need to be able to distinguish them from each other and dispatch to different algorithms for selector lookups.
If you don’t need those things, why would you want any more than ranges as lookups, or work around the extra code and ridiculous amount of tests required to support it.
But without them I can’t use AxisKeys.jl or AxisArrays.jl for my daily work, without a massive addition of functionality that likely would not be accepted in PRs.
So we have multiple packages.