# Make the use of the word "dimension" consistent in the documentation

The word “dimension” is unfortunately used in several incompatible ways across programming languages in the context of arrays.

Consider an N-dimensional array `A[i1, i2, ..., iN]`, where `i1` runs from 1 to `L1`, `i2` runs from 1 to `L2`, etc., so that the total number of array elements is `L1 * L2 * ... * LN`. There are at least three inconsistent usages of the word “dimension” in this setup. In the first usage, a “dimension” simply indicates a specific index `ik`, so that the dimensions of an array are always 1, 2, 3, …, `N`. In the second usage, a `dimension` refers to the number `Lk` of values that the index `ik` ranges over, so that the dimensions of an array are `L1`, `L2`, …, `Lk`, and the total size of the array is the product of its dimensions. In the third usage, this array only has a single “dimension”, which is the number `N`.

Julia follows the first usage - e.g. `permutedims` performs a generalized matrix transpose rather than an array reshape, and the numbers `Lk` are called the “dimension sizes”. Mathematica uses the second usage: `Dimensions[A]` returns `{L1, L2, ..., LN}`.

Given this conflicting terminology, it’s important to use the word “dimension” consistently, but unfortunately the documentation does not do so. It usually uses the word in the first sense, but, for example, it describes `size(A)` as returning “a tuple containing the dimensions of `A`”, inconsistently using the second sense of the word “dimension”. In the very next line it switches to using the word in the first sense. Similarly, it describes `reshape(A, dims...)` as returning “an array containing the same data as `A`, but with different dimensions”. This should say “dimension sizes”.

I know this might sound like extreme nitpicking, but I recently got extremely confused about why `permutedims` was performing a transpose rather than a reshape, because of the documentation’s conflation of “dimension” and “dimension size”. (Also, strictly speaking, all the `dims...` arguments in the documentation should more precisely be `sizes...`, because under the documentation’s more common use of the word, the “dimensions” are always just 1 through `N`.)

6 Likes

I think that this issue is important, but at the same time also difficult because these terms are not used consistently in programming. It would be interesting to see what other languages, eg Fortran, C, R, Python, and Matlab use for these concepts. This could provide some perspective.

I can help with R: it uses the term dimension more or less consistently for the `dim` attribute for an object, which is a vector of integers like `Base.size` in Julia. The R manual also calls this a dimension vector and dimensions, so usage is not super-strict. Docstrings (eg for `apply`, `aperm`) mostly use it in this sense and avoid other usages.

I believe this inconsistency comes form mathematics which itself uses word “dimension” for different things depending on context. For example, we would normally represent a cube - a mathematical object with 3 dimensions - as a 3D array in Julia. The use of words “3 dimensions” here is intuitive and absolutely justified. At the same time, a single vector (e.g. a 1D slice of that cube) of N elements can represent a single point in some vector space. In this case we would say that it’s an N-dimensional vector space - again, common mathematical notation. Wikipedia provides a couple of other uses for word “dimension”.

I like to think of different uses of word “dimension” as the same concept - number of coordinates - in different spaces. For me, `permutedims` operates in the space of object axes (i.e. a cube has 3 dimensions in this space), but `A[i]` is a single coordinate in `length(A)`-dimensional vector space.

I don’t really think it’s possible to make the use of this word consistent across the whole language used for thousands of different tasks, but I’d encourage package developers to stick to a single meaning within their area.

2 Likes

its not that black and white

``````julia> A = rand(8)
8-element Array{Float64,1}:
0.461397
0.83592
0.429355
0.484167
0.12294
0.862584
0.491176
0.948273

julia> reshape(A, (2,2,2))
2×2×2 Array{Float64,3}:
[:, :, 1] =
0.461397  0.429355
0.83592   0.484167

[:, :, 2] =
0.12294   0.491176
0.862584  0.948273
``````

Feel free to file an issue if you can identify a set of changes that need to be done to make the documentation consistent.

4 Likes

in addition to the potential documentation issues, there was a general discussion in

https://github.com/JuliaLang/julia/issues/22665

that discussed the `size` vs `reshape` inconsistency.

@tparker: I think it would be good if you propose some updates to the documentation strings through github. You just have to search the file containing the docstring on github and can directly edit it. This will generate a pull request that then can be discussed/merged.

What I find confusing are the `dims` in the function signature. These simply should be called `sizes`, no?

1 Like

@dfdx I agree that there’s definitely some subtlety in the use of the word “dimension” in the examples you give, but both of those examples actually conform to the first usage in my OP, where the word “dimension” indexes an independent direction in some abstract space, rather than giving the (finite) “length” of that direction, as in the second usage. I believe this is pretty standard across mathematics (at least linear algebra), which strengthens the case for making the documentation consistent.

@Evizero Yes, I agree that `reshape` can alter the number of indices, but it doesn’t have to. The current documentation’s use of the word “dimensions” only covers the case where it does, while my proposed change to “dimension sizes” would cover both cases.

@tobias.knopp Thank you, I have never made a pull request before, but will attempt to do so. Yes, I agree that `dims...` should be changed to `sizes...` - I mentioned that in the last sentence of my OP. I’ll do that in my PR.

Everyone: do we think it’s clearer to use `sizes...` or `size...` as the argument in the function signature? `size...` matches up nicely with the `size()` function, which returns a tuple of dimension sizes, but `sizes...` makes it more clear that the argument represents multiple dimension sizes rather than a single number. I’m inclined toward `sizes...`, especially since the Github thread that tobias.knopp mentioned indicates that the `size()` function might conceivably be renamed `shape()` at some point anyway.

1 Like