Commit to a common syntax for accessing additional index mapping information on axes

blackeneth · August 21, 2022, 3:33am

Your current axiskeys function returns the whole domain:

julia> axiskeys(A, :pol)
2-element Vector{Symbol}:
 :L
 :R

julia> axiskeys(A, :time)
10:10:30

julia> axiskeys(A)
([:L, :R], 10:10:30)

What the user wants to call what’s inside depends on his context - keys, levels, names, ids, make, model, years. You could leave it up to the user how they decompose the items in the domain.

If you want to use a name for what’s inside, I wound suggest “levels.” We often speak of the “levels” of a factor, a treatment, or a variable.

A couple of other words you could consider are “handle” and “alias”.

Zach_Christensen · August 21, 2022, 5:59am

But we aren’t always referring the whole domain. We are often referring to a subdomain. I’m not completely against it but from where I’m standing it seems like domain is just washing our hands of committing to any specific name with a very ambiguous alternative.

“levels” already has a meaning that is widely used in data science and DataAPI.jl. Alias isn’t bad but does have the same tradeoffs as tag and token do (as discussed above). Handle would probably be fine if everybody is onboard.

It would be helpful to hear some thoughts from @mcabbott at some point since their package popularizes the term axiskeys but has also spoken out against using it moving forward.

mcabbott · August 21, 2022, 7:51pm

This is a clever word suggestion.

I guess I think of these things as auxiliary information, attached to what’s essentially still an array, with its usual indexing. The fact that you can look elements up according to the “tag” or “label” is an extra, but in no way the primary way to access things.

For this reason “key” now seems wrong to me, sorry I picked that. In addition to the problem that it collides with Base.keys, which is something else. I think “identifier” or “co-ordinate” suffer the same problem.

This seems like an important difference of persepective. Does this really not make sense? Forbidding non-unique values means you forbid vcat(v, v) with a vector, is this still an array?

It sounds like you do not regard these things (whatever they are called) as auxiliary data, but do think of them as unconventional indices. Identifiers in the same way that column names are in a dataframe. To me that sounds closer to a multi-dimensional dictionary, which (IIRC) Dictionaries.jl planned to provide, at some point (with fast lookup).

Given this difference of perspective, is there a shared concept in need of abstraction, or are there different concepts which ideally would not share the same name?

My package describes the two-step access process as “lookup” as opposed to indexing. (IMO, that needs to be a verb.) It has two modes, one uses findfirst and the other is findall, and findall can match to one or many given keys/tags/labels.

One quirk here is that for example NamedDims.jl does not demand that these dimension names be unique, they can’t always make a NamedTuple. Forbidding that means that m = v .+ v' has to either error, or invent a new name which won’t match v.

But it’s a bit of a weird feature. As far as I know the question of what indexing by dimension name m[i = 1] should do is still unsettled. Maybe v' (and cov…) should change the name.

(NamedDims also has a wildcard :_ which can apply to several dimensions.)

aplavin · August 21, 2022, 9:43pm

I still see more issues with “labels” than “coordinates”…
“Axis labels” would be extremely confusing when plotting. This term is already used for the text labels for axes as a whole (ie close to dimnames conceptually), and individual values on axes are often called ticks.

Don’t see how dictionaries are related here. Julia pretty consistently uses indices and keys as synonyms in multiple places. So we can focus on arrays in this discussion, and not invoke dicts.

Actually, I quite like the recent axisdomain suggestion by @blackeneth !

Why? The values we discuss here represent the complete domain for an axis, in some sense. Whether they are just textual labels, coordinates, or dates, unique or not - they define the domain of what values this axis represents.

Zach_Christensen · August 22, 2022, 1:53am

mcabbott:

This seems like an important difference of persepective. Does this really not make sense? Forbidding non-unique values means you forbid vcat(v, v) with a vector, is this still an array?

It sounds like you do not regard these things (whatever they are called) as auxiliary data, but do think of them as unconventional indices. Identifiers in the same way that column names are in a dataframe. To me that sounds closer to a multi-dimensional dictionary, which (IIRC) Dictionaries.jl planned to provide, at some point (with fast lookup).

Given this difference of perspective, is there a shared concept in need of abstraction, or are there different concepts which ideally would not share the same name?

My package describes the two-step access process as “lookup” as opposed to indexing. (IMO, that needs to be a verb.) It has two modes, one uses findfirst and the other is findall, and findall can match to one or many given keys/tags/labels.

I really appreciate your input here. Thanks!

I’d be willing to relax the initial criteria that these values all be completely unique, but they could probably be considered unconventional indices akin to what DataFrames.jl does with its column names. We’ve discussed multi-dimensional dictionaries before with @oxinabox, but that would prohibit standard indexing. I don’t think that’s what we want. I think we want an additional set of references mapped to each index. I’m not sure whether that means we are already describing something completely different than a key or not.

There are type tags and I think the compiler and LLVM use the term “tag” pretty often. The meaning their may not be completely unrelated to what we are trying to achieve here though. I’d love to hear your thoughts on the verbage (lookup, findfirst, etc) and execution of related methods as related features are further developed.

I don’t see any conflict with the Makie.jl API. If you use xlabel you are getting the one label for the axis if you use xticklabels you are getting the multiple labels along the x axis. (In fact, I think this was something discussed when I submitted the PR for the various tick labels, but the history for that repo is a bit difficult to track). You could argue that the difference between “axis label” and “axis labels” isn’t clear enough but we have other methods that differ by only one letter too.

I was thinking of the axis as representing this “domain”. For example, I usually am looking at brains and if you were looking along the sagittal axis it’s often sliced up as you look at different parts so if the domain was “sagittal” then often we are looking at the subdomain in a given array. I just figured the use of ontological terminology was meant to construct ones own ontology through something like domain names for each axis. The connection between “domain” and what we are trying to get isn’t as clear to me but if people can agree this is the best way to refer to these values along the axis that’s fine with me.

Side note: It would be pretty awesome if someone developed a way of interacting with OWL based ontologies in Julia.

aplavin · August 22, 2022, 7:37am

I didn’t say anything about Makie, don’t even use it. My point was more general and not related to any specific library, even though they seem to use very similar APIs - eg matlab and matplotlib also use xlabel/xticklables in the same sense as you say.
I would intuitively expect axislabels(A) to return 2 values for a 2-dim array - one label per axis, what dimnames actually does.

So, the only issue is that the “axis domain” does not cover the full “data model domain”?
Don’t see how this is a problem. The function axisdomain(A) would specifically mean the domain of axes of A, whatever range of values they cover. The domain for a subarray would be smaller, of course.

It’s like a function f: R -> R, f(x) = x + 1 has the domain of all reals, but its restriction f: N -> R, f(x) = x + 1 has only positive integers as its domain.

Yes, please! I didn’t notice this limitation in the previous discussion, but regularly use or encounter duplicated axiskeys in AxisKeys.jl arrays.

Zach_Christensen · August 22, 2022, 10:05am

I think it’s pretty clear that axislabels(A, :x) == xlabels(A) but I can see how it might be confusing for the one arg case. This logic does bleed over into most of the other cases to though and even dimnames has the same problem. It kind of aounds like the names along a dimension not necessarily the names of each dimension but people have found the meaning of “a name for each dimension” to be clearly implied. We could address this by using a more appropriate prefix with whatever suffix we settle on.

That’s fine but the original suggestion was referring to ontological domains, not mathematic domain sets. It’s common in ontology for a domain to be a single word referring to some set of other domains or values. It’s fine with me to move forward with domain if this isn’t confusing to everyone else, we just need to be careful about how we document it.

blackeneth · August 22, 2022, 7:23pm

That’s fine but the original suggestion was referring to ontological domains, not mathematic domain sets. It’s common in ontology for a domain to be a single word referring to some set of other domains or values.

Heh, interesting, as I came up with domain from thinking about math functions! Then apparently added confusion by “ontology”!

So delete ontology and think of it in terms of mathematics.

Zach_Christensen · August 22, 2022, 7:37pm

That’s totally fine. It does make much more sense when you think of it in terms of mathematics. I don’t think it’s as clear to people without a math background though. We do a lot of math in Julia but when I asked a couple people in my lab what they thought it meant most were a bit confused. That’s just anecdotal though and you could probably argue the people in my lab aren’t as mathematically inclined as many people that would end up coding in Julia. If everybody feels my ill composed experimental sample isn’t reflective of the general population then I’d be happy to seriously consider “domain”.

Also, I want to thank everyone participating! Bike-shedding is tedious but everybody participating has been very reasonable and patient.

jar1 · August 22, 2022, 7:55pm

julia> A = KeyedArray([1 2 3; 4 5 6], pol=[:L, :R], time=10:10:30)
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   pol ∈ 2-element Vector{Symbol}
→   time ∈ 3-element StepRange{Int64,...}
And data, 2×3 Matrix{Int64}:
        (10)  (20)  (30)
  (:L)     1     2     3
  (:R)     4     5     6

I think there could be ambiguity about what “axis labels” means. It could mean:

each axis has a label identifying that axis. In this case, “pol” and “time” would be axis labels: the labels over/across the axes.
each axis has a set of labels identifying the indices along that axis. In this case, L/R and 10/20/30 are axis labels: the labels within the axes.

“Index labels” might more clearly refer to the latter of these.

Zach_Christensen · August 22, 2022, 8:01pm

At some point in my brain “axis” meant indices along the axis and “dim”/“dimension” meant value associated with the dimension irrespective of size or indices. I’m not sure whether this was actually a part of one of the many conversations lost to slack or spread across discussions in related PRs/issues. Maybe I just made it up . This is another case where I will differ to the group on what makes the most sense, because I don’t want to bias the conversation if I just created this distinction.

I like that a lot. Maybe some variant on that like indice_labels.

blackeneth · August 22, 2022, 8:27pm

I think your result would be interacting with few people remembering “domain” from algebra class. You might ask first, “do you know what a domain of a function is?”

If a person doesn’t know the domain of a function, explain it. Then introduce axisdomain() and see if the concept transfers.

jar1 · August 22, 2022, 9:10pm

I don’t have a strong opinion about axis labels vs index labels, but I don’t like “indice labels” because “indice” isn’t a word.

Zach_Christensen · August 22, 2022, 9:32pm

What we’ve discussed here might be simple enough. I’ve tried explaining a lot of basic math with variable success. To be fair, I have no formal training as a teacher and may just be bad at this. Maybe I’ll try again tomorrow when they come back to lab. It seems unfortunate that “domain” will consistently need to be explained to a fraction of people that might use the method (although that fraction might be small and I might be artificially inflating this issue).

I’m also not sure domain is descriptive enough. We can say that axisdomain tells us that we are getting a domain that maps to the axis, but we actually want something more specific than that. For example, if we had a table array we could attach a data dictionary along the column axis, but I don’t think we want a a dictionary of feature parameters at each index.

That’s completely fair. My bad

Zach_Christensen · August 26, 2022, 1:52pm

What about index_labels?

jar1 · August 26, 2022, 7:12pm

indexlabels and index_labels seem ok to me. I don’t mind axislabels and axis_labels too much either. I wish the style guide had made a decision about underscores rather than leaving it up to authors…

Topic		Replies	Views
[ANN] AxisKeys.jl Package Announcements	2	702	April 5, 2020
Indexing by names, current favorites in the package space Data question , indexing , arrays	7	184	June 6, 2025
The fate of DimensionalArrays / AxisArrays in Julia, and which to actually use Specific Domains	7	3329	May 19, 2022
NamedArrays question / request Internals & Design package	6	597	October 26, 2020
Newbie Syntax Help: Colon in Index Syntax ---> sepal_length_column = iris[:Sepal_Length] General Usage	6	808	July 2, 2018

Commit to a common syntax for accessing additional index mapping information on axes

Related topics