Is this a good behavior for `size(::Array, ::Integer)`?

I have an open mind here; but I recently had a screw-up that made me wonder if the current behavior of `size` is ideal. My screw-up was that I had called `size(v, 2)` where `v` is a `Vector`, when I should have been calling `size(v,1)` or `length(v)`. It took me a while to catch the error partially because I would have thought that `size(v::Vector, 2)` would throw an error.

I tend to think of rank-N tensors as having exactly N indices, which is a little different than having infinitely many indices with all the ones past a certain number having only 1 dimension, mainly because it is possible to have tensors with only 1 component that do not transform as a singlet under some groups (e.g. rescaling); though admittedly this is rather abstract and of course arrays often possess none of the qualities of a tensor.

The current behavior is reasonable, so I have no strong opinions, but I thought Iâ€™d throw this out there and see what everyone elseâ€™s opinion is. Note that at least in numpy there is no equivalent since you canâ€™t do `np.shape(A,1)`.

I find size padding a subtle source of bugs, and opened an issue recently:
https://github.com/JuliaLang/julia/issues/23985

Another reason I thought of why this might be bad is that you would expect `size(v, n)` to return the same thing as `size(v)[n]` which right now is not the case.

Isnâ€™t that the reason `size(v, n)` exist at all?

Is it? Where is this documented?

My point was simply that if `size(v, n) === size(v)[n]` then `size(v, n)` is redundant.

I donâ€™t think this has much to do with `size`. Having `rand(3)[2, 1]` work and throw an error for `size(rand(3), 2)` seems like a non-starter. So what you want to do is remove indexing with trailing ones?

My problem is the following: the commit in the issue I linked above just introduced size padding, without any prior discussion (that I could find).

First, I would like to just understand why it was necessary. I consider it a source of bugs and would prefer to have it removed, but I keep an open mind and would be interested to hear why people find it useful.

And yes, if size padding is removed, the majority of types would not need to define a `size(v, n)` because it could fall back to `size(v)[n]`. The exceptions are of course when `size(v, n)` is cheaper to calculate.

My understanding is that `size(v)[n]` would have to allocate a tuple where as `size(v,n)` would not, though since it would be an rvalue Iâ€™m not sure if the compiler elides this somehow.

Honestly, Iâ€™m not crazy about `rand(3)[2,1]` either. The issue is that (I think) people expect these sorts of things to throw errors. To me the distinction between an array with trailing 1-dimensional indices and one without is like the distinction between a scalar and a 1-dimensional vector. This distinction was important enough for the introduction of `RowVector`. It might be that there are really good reasons why trailing indices need to be allowed, but if not Iâ€™d be in favor of their removal.

Also, to expand on my anecdote from yesterday, I had made this error because I originally had some code with a matrix that I later changed to similar but of course distinct code with a vector. That one function call got left in and screwed everything up. It seems to me like this sort of error might not be uncommon.

1 Like

Mathematicians generally identify column matrices with vectors and more generally consider (n-1) dimensional tensors to be embedded within n-dimensional tensors in this way. By allowing indexing with ones beyond the number of indices of an array or omitting trailing indices into singleton dimensions, you get a form of this standard identification â€“ you can treat vectors like column matrices and vice versa.

I had made this error because I originally had some code with a matrix that I later changed to similar but of course distinct code with a vector.

The whole point of this behavior is that you can apply code you wrote for a matrix and pass a vector to it and it will work, treating the vector as an `n x 1` matrix, so Iâ€™m a bit confused about how this caused your code to break â€“ it should continue to function as it did before.

Fair point, though I do have a hard time imagining a case where youâ€™d want to do that without changing any of the index arguments.

I broke my code because I left in a `size(v,2)` when it should have been `size(v,1`). This absolutely 100% was my fault, I was not trying to argue that I am a good coder . My point was that this sort of thing can be easy to miss if people expect `size(v,2)` to throw an error. Like I said, Iâ€™m open minded about it, I donâ€™t think the current behavior is unreasonable, but it would make me feel better to see a use case where this really is helpful.

I have a fairly extensive mathematical background and I definitely do not consider rank-(n-1) tensors to be embedded within n-dimensional tensors (with trailing indices having 1 dimension). Like I said before, this is for the very good reason that there is a distinction between a 1-dimensional representation and a singlet representation. The major exception I can think of would be tensor networks but in that context things are very clear because all indices just transform under SU(N). Granted, I have a high energy physics background, so my thinking on this may be very biased.

2 Likes

Other embeddings of (n-1)-tensors into n-tensors may be less universal, but the â€śvectors are columnsâ€ť identification is so fundamental to linear algebra that most people donâ€™t even realize that theyâ€™re doing it.

Doesnâ€™t the fact that you were doing `size(v,2)` mean that your code was broken in the first place? Is that the issue here? Not that your code broke when you applied it to a vector instead of a matrix, but rather that it didnâ€™t error in the first place when you asked for the second dimension of a vector?

But thatâ€™s what pretty much all of `Base` does:

``````julia> size(ones(3), 9)
1
``````

asking for the `k > ndims` dimension donâ€™t error, but pad with `1` silently.

I consider this a misfeature precisely because it leads to bugs like this, but if the core devs insist, at least

1. it should be documented,
2. all subtypes of `AbstractArray` should be required to do it.
2 Likes

Thatâ€™s a different issue. The distinction between vectors and columns is the distinction between conjugate, but typically non-trivial, representations of a group. Because the most common representations of the most common groups are self-conjugate (or only involve a simple transpose), this distinction can often be neglected. The need for `RowVector` is a good example of how you can get into trouble by only implying this sort of conjugation rather than annotating it explicitly (Iâ€™m not arguing for any changes here). The distinction between non-trivial 1-dimensional representations and singlets is different. Granted, most groups only admit trivial 1-dimensional representations (a famous counter-example being U(1)) and this is the reason why trailing 1â€™s make any sense at all.

Yes absolutely, my code was broken and it was completely my fault, it was an embarrassing error. If `size` worked like Iâ€™m suggesting, my code would have thrown a bounds error, not work properly. What I wanted to discuss was

• Is this a common error? Iâ€™m pretty sure Iâ€™ve done it before, but in any previous instance I quickly fixed it. Are other people making this error frequently?
• Is having the trailing 1â€™s functionality worth the potential of having this error, especially considering that many users probably donâ€™t expect them?

By the way, in spite of my above argument, I certainly am sensitive to the fact that arrays are often just blocks of adjacent numbers in memory and often donâ€™t have any of the properties of a tensor, so I donâ€™t necessarily consider any tensor theoretical considerations to be the final word.

1 Like