# Trouble Understanding Slicing

In the following code, I can’t understand why sum(a[1,:],dims=2) gives a different answer then the first row of sum(a[1:2,:],dims=2).

``````julia> a = [1 2 3 4; 5 6 7 8; 9 10 11 12];

julia> sum(a, dims=2)
3×1 Array{Int64,2}:
10
26
42

julia> sum(a[1:2,:], dims=2)
2×1 Array{Int64,2}:
10
26

julia> sum(a[1,:], dims=2)
4-element Array{Int64,1}:
1
2
3
4

julia>
``````
2 Likes

When tackling things like this, I always recommend simplifying things as much as possible to figure it out. In this case, removing the `sum` does the trick:

So summing across the 2nd dimension in the latter case is a no-op, because there is no second dimension! This happens because Julia drops dimensions indexed by a scalar. If you want to preserve that dimension, just use a vector like `` or `1:1` instead. More details on indexing are here:

https://docs.julialang.org/en/v1/manual/arrays/#man-array-indexing-1

5 Likes

mbauman, thank you. I could only find two examples in indexing that show this but I couldn’t find any discussions; I scanned through it twice, but didn’t read word for word so I might have missed it. I come from a Matlab background where the shapes and dimensions of matrices are not changed. Indeed, I don’t think Matlab has anything equivalent to vectors; my understanding is that everything is a matrix in Matlab. I am new to Julia, and my largest frustration with Julia is the dropping of dimensions, and automatic conversions from rows to columns; the other frustration is how the packages and environments and revise are set up; I’m hoping I will eventually understand the reasoning behind why vectors are needed at all and why a[1,:] should both drop dimensions and change shape; it just seems counter-intuitive to me.

The reason Julia drops dimensions is for consistency. If `A` is a matrix, most people want `A[i,j]` to be a scalar. In matlab, scalars don’t exist – this is produces a 1x1 matrix. This sucks for speed and efficiency and also just common sense. The generalization of this rule is that every scalar index drops the result a dimension. (this also comes from the fact that indexing a vector `v` by `v[i]` should also produce a scalar). If you want matlab like behavior, however indexing `A[[i],:]` will produce a column matrix. This is fairly intuitive if you consider that `A[[i1,i2],:]` needs to be a 2xm matrix, and type stability requires that `A[[i],:]` has the same type as `A[[i1,i2],:]`.

9 Likes

Oscar, thank you for taking the time to reply. I’m certain this can’t be changed at this time, but I don’t think I agree with everything you are saying.

“The reason Julia drops dimensions is for consistency. If `A` is a matrix, most people want `A[i,j]` to be a scalar. In matlab, scalars don’t exist – this is produces a 1x1 matrix. This sucks for speed and efficiency and also just common sense. The generalization of this rule is that every scalar index drops the result a dimension. (this also comes from the fact that indexing a vector `v` by `v[i]` should also produce a scalar).”

I certainly agree that indexing a single element should produce a scalar. But I don’t agree that indexing multiple elements should always drop the dimensions to be consistent. I see these as being different operations and being consistent here achieves little functionality.

I have not found any arguments why vectors are necessary as opposed to arrays. I have a guess, and only a guess. My guess is that arrays in Julia may actually be coded as vectors of vectors, maybe the only container is a vector, with a vector being anything?

Even if Julia drops the dimensions, I don’t think this is really the problem. If one has an array A, one can initialize A[k,:] = rowVector. But one can not initialize A[k,:] = columnVector. Where I run into issues time and time again with Julia is when it changes rows to columns. I can’t see any reason why this should be done. If someone took a slice of a row of a vector and wanted a column, this is very easy to do using either transpose or ’ depending on whether it was complex or real, or using reshape(), or permutedims() (if one doesn’t want to be recursive?) etc.

If you want matlab like behavior, however indexing `A[[i],:]` will produce a column matrix. This is fairly intuitive if you consider that `A[[i1,i2],:]` needs to be a 2xm matrix, and type stability requires that `A[[i],:]` has the same type as `A[[i1,i2],:]` .

Again, this might be religion, but A[i],:] is a row; I personally don’t see why changing this into a column is intuitive. I think indexing `A[[i],:]` should produce a row array. I also, think indexing A[i,:] should produce a row whether it is a vector or an array, and I personally think that a slice that produces multiple elements doesn’t need to be consistent with a slice that produces a single element in dropping dimensions. If this consistency achieved additional functionality, or I could see situations where not having the dropping of dimensions would cause issues, then I would change my opinion here.

Regarding the inefficiencies of not dropping dimensions. This is probably true if it results in a scalar, but I don’t know if this would be true for array to vector without being able to look into the code; I can’t think on any inherent reasons why this should be true.

Do you know why vectors are actually needed? I know you can only push! and pull! from vectors, so maybe this is the reason. However, if the indexing left the shape, and since you can assign a row vector to a row slice of a matrix, I think leaving the shape unchanged would be much more intuitive.

Please note that this was discussed extensively, eg

is a useful starting point. While no semantics is ever set in stone, very compelling arguments would be needed to change the current behavior.

I will leave the other questions aside, but I have to say that the absence of proper vectors in Matlab is a huge annoyance. It’s causing me endless trouble almost every time I use Matlab (which is at least 5 days a week.) You really should learn to appreciate it.

The absence of vectors in Matlab means that in all code you have to make a decision and try to keep track of whether vectors are row or column vectors. Matlab pretends to be column-major, and functions like `sum`, `prod`, `max`, `plot`, etc. etc. etc. all operate along columns. But almost all built-in Matlab functions return row-vectors(!!!), `linspace`, `1:n`, `mynewvariable(n) = 5`, etc. etc. etc.

One of the very worst: Matlab’s `for` loops can iterate over row vectors, but not over column vectors  Every time I write code I have to either check if an input is row or column, or enforce one or the other with `x(:)` or `x(:).'`, and then I suddenly have to interface with code that prefers the opposite of what I have chosen (using row-vectors instead of column.)

Working in python or julia where there are proper 1-dimensional vectors is always a huge relief. It’s one of the worst things about Matlab.

(Yesterday, I was working with a class where most of the fields had column vectors, but one of them, for some ungodly reason, was a row vector. I thought, “let me use `squeeze` in my code, so I don’t have to special-case this field, to fix this”. Didn’t work, because `squeeze` removes dimensions of length 1 – except for row vectors which remain unchanged … In other words, squeezing a 1x3x4 array gives a 3x4, array, but a 1x3 array remains 1x3.)

(Postscript: Sorry about the heated post @kw_martin , but this is one of the things about Matlab that actually makes me genuinely angry just thinking about it.)

6 Likes

The reason scalar indexing drops dimensions is a little more subtle than I explained initially. The idea is that `A[i,:][j]=A[i,j]` is an identity that feels like it should hold. Since `A[i,j]` is a scalar, `A[i,:]` needs to be a type which when indexed by 1 index gives a scalar. The fact that Julia has ND arrays makes this harder to get around since matrices aren’t consider special-- they are just `Array{T,2}`.

The reason `A[[i],:]` needs to have the same type as `A[[i,j],:]` is for type stability. Both of these are calls to `getindex(::Vector{Int}, ::COLON_TYPE)`, so if you want type stable code (which is critical for performance) both of these need to have the same type.

1 Like

We could indeed use a bit more in the documentation on this rule — right now it’s very tersely described as:

`X = A[I_1, I_2, ..., I_n]`

[…]

`X` is an array with the same number of dimensions as the sum of the dimensionalities of all the indices.

[…]

If `I_1` is changed to a two-dimensional matrix, then `X` becomes an `n+1` -dimensional array of shape `(size(I_1, 1), size(I_1, 2), length(I_2), ..., length(I_n))` . The matrix adds a dimension.

We sometimes call this behavior dim-sum (or APL, from which it was inspired) indexing. It’s a powerful technique that is fully general — that is, it doesn’t matter which index or how many indices there are. Matlab has some of these behaviors (e.g., indexing a 1-column by a matrix of indices produces a matrix the shape of the index, sometimes*, IIRC), but it only works when you’re using a single (linear) index.

* the sometimes exception is that indexing a column vector by a row vector preserves the column-ness of the original structure. I think. But it’s been a looong time since I’ve had a Matlab license.

1 Like

A post was split to a new topic: Separate array elements without adding spaces