Commit to a common syntax for accessing additional index mapping information on axes

Why can’t it be the same function?
Named dimensions → return NamedTuple
Unnamed dimensions → return Tuple

It can’t ever be a NamedTuple unless we decide that all dimension names are unique, including the representation for unnamed dimensions. If we return some type that acts like a NamedTuple but isn’t, that makes it difficult to pass around and use in a meaningful way because we often perform recursive functions that require dispatching on the number of fields in the tuple and the type in the first position.

This doesn’t mean we can’t ever do something like this in the future, but I’d rather not build it into the design right now if we still don’t even now what this named collection would look like.

Ah, now I see: you must be talking about mixing named and unnamed dimensions? I didn’t think of that before at all. If either all dims are named, or all are unnamed, there’s no issue, right?

Is it even possible to unambiguously mix named and unnamed in the same array - any examples in the wild?

GitHub - mcabbott/NamedPlus.jl: 🏴‍☠️ is probably the most extensive experimentation with keeping names with their corresponding stuff.

I’m not saying that we can’t handle returning something that’s named in the future. I just don’t think there’s a clear path forward for that now and it shouldn’t be built into a very basic level for extracting this data unless we can ensure it won’t overcomplicate other things.

Another data point here: @oxinabox 's NamedDims

NamedDims only supports dimension names. It doesn’t provide anything for the length of the entire dimension. AxisKeys uses NamedDims and adds this feature

1 Like

Is this a job for:

I don’t think FrankenTuples.jl would work for this. It has the unnamed and named components seperately, so dimensions with the names (:x, :_, :y) couldn’t be represented here.

If you want both dimension names and these keys/labels then you just do dimnames(data) and newsyntax(data). If there’s a perfect solution out there I’m open to it, but I don’t think there’s much benefit to complicating this new method just to carry around extra information by default.

1 Like

@Zach_Christensen I’m warming to axislookup or axislabels.

axislookup captures the fact we usually use these values to look up an index for the axis, and axislabels that they are the printed/plotted labels. But axislabels can also be interpreted to mean the name of the axis, like X and Y.

I think axislookup or lookup are the least overloaded in base and the broader ecosystem.

(or pluralised axislookups)

Indeed, there are two main uses for these axis-associated values: lookup and display labels. Not sure if one of these usages is much more common than the other, both seem pretty frequent.

Both lookup and labels strongly focus on one of those usecases. These values are not really “lookup” if only used to determine plot ticks, and not really “labels” if only used for selections like arr(time = 0..1).

Maybe, it’s actually not bad to use more general names, even if they are somewhat overloaded? For example, axisvalues is also quite nice, but only mentioned once in the first post. axiskeys/axisvalues don’t carry that strong focus on specific usages of these values. Also, they intuitively fit with a common approach of indexing by key/value: that can be written arr[time=Value(1)] if axisvalues if used, or arr[time=Key(1)] if axiskeys.

Values has the same problem as keys. It already has a meaning in Julia. It also doesn’t provide much description about what is being returned.

axislookups sounds a bit awkward to me but in the end I’m open to any reasonable suggestion we can all live with. (Reasonable meaning not intentionally ridiculous)

1 Like

Time for a quick update:

@yha and @jar1 pointed out that the set of criteria I made at first may be overly restrictive/confusing. I think @aplavin did a good job of conceptualizing this as collections along an axis that can be used to lookup indices or label indices for plots and such. I think that helps differentiate from something generic along the axis, such as the discussion of colmetadata.

There’s also been discussion of attaching dimension names in the form of a NamedTuple (or some other type that would support the return of dimnames). There are certain complexities associated with doing so (that don’t permit use of NamedTuple) and not all data will have meaningful values from dimnames anyway, so it’s better to return a simple Tuple for what we’re currently discussing, permitting some other method to combine the two later on as needed.

Feedback on names:

  1. axiskeys
    1. (:heavy_plus_sign:) already used by `AxisKeys.jl so some people are used to it (most votes so far)
    2. (:heavy_plus_sign:) semantically related to the common use case of “keys”
    3. (:heavy_minus_sign:) “keys” already has a meaning for arrays and an axis is AbstractUnitRange{Int} <: AbstractVector{Int}. So it’s a bit odd that keys(axes(data, dim)) would not mean the same thing.
  2. axisvalues
    1. (:heavy_plus_sign:) already used by AxisArrays.jl so some people are used to it
    2. (:heavy_minus_sign:) axisvalues(data, dim) != values(axes(data, dim)).
  3. axislabels
    1. (:heavy_plus_sign:) meaning of labels is fairly compatible with use cases (keys ≈ labels; axis labels on plots)
    2. (:heavy_minus_sign:) potentially too generic, making it awkward if we end up wanting labels(data) for something unrelated.
  4. axislookups
    • (:heavy_plus_sign:) same semantic benefit as 1.2
    • (:heavy_minus_sign:) potentially too descriptive, sounding distinct from use as axis labels for plots/images/etc.
  5. axisnames
    • (:heavy_plus_sign:) R uses dimnames for similar functionality
    • (:heavy_minus_sign:) “names” is overly specific

At the very least, we can probably eliminate axisnames as a candidate now.

axiskeys and axisvalues are problematic IMO for the reasons you note.

axislabels sounds better than axislookups to me, as the series “indices”/“keys”/“labels” sounds more consistent than “indices”/“keys”/“lookups”. “Lookup” is an action, contrary to all other terms. And the plural “lookups” is weird.

Another option would be something like axistags. But I don’t think “tag” is used anywhere else with this meaning.

1 Like

axislabels is growing on me. I think tag and token can have different meanings in code than this.

1 Like

axisids might work.

Is “identifiers” a more apt description that “labels”? If so, would axisidentifiers be more clear?

“ID” is somewhat in use already for things like Base.objectid, which relates to IdDict and IdSet.

I updated the original axes_keys PR to use axislabels. This doesn’t mean the syntax is set in stone and the documentation still needs to be fledged out with examples, but I thought it might be good to show what this would look like.

It looks like we had 5 for axiskeys, 2 for labels, 2 for “other” and 1 for name. I think some opinions may have changed throughout the conversion though. From the discussion it seems like those who wanted axiskeys would be fine with something similar like axislabels, which would resolve some of the overlap with keys.

Would anyone object to moving forward with axislabels?

2 Likes

Xarray uses the “coords” term that sounds both generic enough and not overloaded much in the julia ecosystem. So, maybe axiscoords, dimcoords, or something similar would be best?

2 Likes