Trouble Understanding Slicing

kw_martin · March 31, 2020, 9:30pm

In the following code, I can’t understand why sum(a[1,:],dims=2) gives a different answer then the first row of sum(a[1:2,:],dims=2).

julia> a = [1 2 3 4; 5 6 7 8; 9 10 11 12];

julia> sum(a, dims=2)
3×1 Array{Int64,2}:
 10
 26
 42

julia> sum(a[1:2,:], dims=2)
2×1 Array{Int64,2}:
 10
 26

julia> sum(a[1,:], dims=2)
4-element Array{Int64,1}:
 1
 2
 3
 4

julia>

mbauman · March 31, 2020, 9:44pm

When tackling things like this, I always recommend simplifying things as much as possible to figure it out. In this case, removing the sum does the trick:

So summing across the 2nd dimension in the latter case is a no-op, because there is no second dimension! This happens because Julia drops dimensions indexed by a scalar. If you want to preserve that dimension, just use a vector like [1] or 1:1 instead. More details on indexing are here:

https://docs.julialang.org/en/v1/manual/arrays/#man-array-indexing-1

kw_martin · April 1, 2020, 1:14am

mbauman, thank you. I could only find two examples in indexing that show this but I couldn’t find any discussions; I scanned through it twice, but didn’t read word for word so I might have missed it. I come from a Matlab background where the shapes and dimensions of matrices are not changed. Indeed, I don’t think Matlab has anything equivalent to vectors; my understanding is that everything is a matrix in Matlab. I am new to Julia, and my largest frustration with Julia is the dropping of dimensions, and automatic conversions from rows to columns; the other frustration is how the packages and environments and revise are set up; I’m hoping I will eventually understand the reasoning behind why vectors are needed at all and why a[1,:] should both drop dimensions and change shape; it just seems counter-intuitive to me.

Oscar_Smith · April 1, 2020, 1:32am

The reason Julia drops dimensions is for consistency. If A is a matrix, most people want A[i,j] to be a scalar. In matlab, scalars don’t exist – this is produces a 1x1 matrix. This sucks for speed and efficiency and also just common sense. The generalization of this rule is that every scalar index drops the result a dimension. (this also comes from the fact that indexing a vector v by v[i] should also produce a scalar). If you want matlab like behavior, however indexing A[[i],:] will produce a column matrix. This is fairly intuitive if you consider that A[[i1,i2],:] needs to be a 2xm matrix, and type stability requires that A[[i],:] has the same type as A[[i1,i2],:].

kw_martin · April 1, 2020, 11:11am

Oscar, thank you for taking the time to reply. I’m certain this can’t be changed at this time, but I don’t think I agree with everything you are saying.

“The reason Julia drops dimensions is for consistency. If A is a matrix, most people want A[i,j] to be a scalar. In matlab, scalars don’t exist – this is produces a 1x1 matrix. This sucks for speed and efficiency and also just common sense. The generalization of this rule is that every scalar index drops the result a dimension. (this also comes from the fact that indexing a vector v by v[i] should also produce a scalar).”

I certainly agree that indexing a single element should produce a scalar. But I don’t agree that indexing multiple elements should always drop the dimensions to be consistent. I see these as being different operations and being consistent here achieves little functionality.

I have not found any arguments why vectors are necessary as opposed to arrays. I have a guess, and only a guess. My guess is that arrays in Julia may actually be coded as vectors of vectors, maybe the only container is a vector, with a vector being anything?

Even if Julia drops the dimensions, I don’t think this is really the problem. If one has an array A, one can initialize A[k,:] = rowVector. But one can not initialize A[k,:] = columnVector. Where I run into issues time and time again with Julia is when it changes rows to columns. I can’t see any reason why this should be done. If someone took a slice of a row of a vector and wanted a column, this is very easy to do using either transpose or ’ depending on whether it was complex or real, or using reshape(), or permutedims() (if one doesn’t want to be recursive?) etc.

If you want matlab like behavior, however indexing A[[i],:] will produce a column matrix. This is fairly intuitive if you consider that A[[i1,i2],:] needs to be a 2xm matrix, and type stability requires that A[[i],:] has the same type as A[[i1,i2],:] .

Again, this might be religion, but A[i],:] is a row; I personally don’t see why changing this into a column is intuitive. I think indexing A[[i],:] should produce a row array. I also, think indexing A[i,:] should produce a row whether it is a vector or an array, and I personally think that a slice that produces multiple elements doesn’t need to be consistent with a slice that produces a single element in dropping dimensions. If this consistency achieved additional functionality, or I could see situations where not having the dropping of dimensions would cause issues, then I would change my opinion here.

Regarding the inefficiencies of not dropping dimensions. This is probably true if it results in a scalar, but I don’t know if this would be true for array to vector without being able to look into the code; I can’t think on any inherent reasons why this should be true.

Do you know why vectors are actually needed? I know you can only push! and pull! from vectors, so maybe this is the reason. However, if the indexing left the shape, and since you can assign a row vector to a row slice of a matrix, I think leaving the shape unchanged would be much more intuitive.

Tamas_Papp · April 1, 2020, 11:49am

Please note that this was discussed extensively, eg

github.com/JuliaLang/julia

Arraypocalypse Now and Then

opened 11:41PM - 15 Sep 15 UTC

closed 05:48PM - 20 Jul 17 UTC

mbauman

linear algebra arrays

This issue supersedes the 0.4 work towards array nirvana (#7941), and ~~will~~ t…racks the issues we aim to complete during 0.5 and beyond — now updated through work on 0.7. This is an umbrella issue and will track specific tasks in other issues. Please feel free to add things that I've missed. **Required underlying technologies** - [x] Julia native bounds checking and removal (#7799). Several tries have been made at this, but I believe the current plan of action is to make `@inbounds` elide code blocks hidden within an `@boundscheck` macro, propagating down only one level of inlining (https://github.com/JuliaLang/julia/issues/7799#issuecomment-117362695). This is a strong requirement for the subsequent steps. (implemented in #14474) - [x] ReshapedArrays (#10507). Requires better performance: https://groups.google.com/d/msg/julia-dev/7M5qzmXIChM/kOTlGSIvAwAJ ## Major 0.5 breaking behavior changes - [x] Drop dimensions indexed by a scalar (https://github.com/JuliaLang/julia/issues/4774#issuecomment-81228816; more generally, APL-style slicing where the rank of a slice is the sum of the ranks of the indexes, see below). PR at #13612. - [x] Flip the switch on the concatenation deprecation (#8599) - [x] Remove default no-op behavior for (c)transpose (#13171) - [x] Change change `sub` behaviour to `slice` (#16846) ## Major 0.6 breaking behavior changes - [x] Vector transpose returns a covector (https://github.com/JuliaLang/julia/issues/4774#issuecomment-81228816). Implementation in #19670. - [x] Vector conjugation returns lazy wrapper (#20047) ## Possible future breaking changes - [x] Matrix transposition and conjugation return lazy wrappers (#25364) - [ ] ~~Return slices as views. A first attempt at this was at https://github.com/JuliaLang/julia/pull/9150. Still unclear whether the possible performance changes are consistent and large enough to be worth the breakage.~~ See https://github.com/JuliaLang/julia/issues/3701. - [ ] Should reductions drop dimensions? #16606 **New functionality** - [x] Allow expression of varargs of defined length (#11242). This allows us to take full advantage of #10525. - [x] Ditch special lowering of Ac_mul_Bt, use dispatch on the lazy transpose wrappers instead. (#5332, #25217) - [x] Dimensions indexed by multidimensional arrays add dimensions (full APL-style: the dimensionality of the result is the sum of the dimensionalities of the indices). (#15431) - [x] ~~Allow any index type in non-scalar indexing (#12567).~~ ~~Tighten scalar indexing to indices `<: Integer` and widen non scalar indexing to `<: Union{Number, AbstractArray, Colon}` (https://github.com/JuliaLang/julia/pull/12567#issuecomment-170982983).~~ More systematic conversion of indices such that any index type can be converted into an `Int` or `AbstractArray`: #19730 - [ ] Easier creation of immutable arrays with tuples and #12113. **Other speculative possibilities** - [ ] A mutable fixed-size buffer type, which would allow for a Julia-native `Array` definition (#12447); this type could also be used for I/O buffers and string storage. - [x] Base ~~`IndexSet`~~ `IntSet` on `BitArray` ~~or perhaps any `AbstractArray{Bool}`~~. (#20456) - [x] Rework nonscalar indexing to prevent calling `find` on logical arrays and simply wrap it with an ~~`IndexSet`~~ `LogicalIndex` instead? (#19730) - [ ] Negated indexing with complement `IndexSet` (https://github.com/JuliaLang/julia/issues/1032) or special `Not` type? (Perhaps in a package: https://github.com/mbauman/InvertedIndices.jl) - [x] Deprecate the linearization of trailing dimensions when more than one index is provided (partial linear indexing). (#20079) - [x] ~~Only allow indexing into N-dimensional arrays with 1 or N indices, deprecating "partial" indexing and trailing singleton dimensions (https://github.com/JuliaLang/julia/issues/5396 and #14770). Initial attempt at #20040.~~ Only allow linear indexing when there is exactly one index provided. Only allow omitting indices (using less than N indices in an N-dimensional array) when all omitted dimensions are of length 1. Only allow trailing indices (more than N indices) when all indices are 1 (singletons). (#21750) - [ ] Find alternate syntax for typed arrays – indexing into a type (`T[...]`) is kind of a bad pun. This syntax is especially bad since some variations are parsed as indexing into a type while others are parsed as special forms (typed hvcat, typed comprehensions) - [x] Change hashing to run-length encoding of the diff of arrays, which would allow integer ranges and arrays to hash as equal again. (https://github.com/JuliaLang/julia/issues/12226#issuecomment-122952826, #16401) - [x] Move sparse arrays out of base into a standard package. (#25249) - [x] Allow nontraditional indices (#16260) - [x] ~~`@sub`~~ `@view` macro (#16564)

is a useful starting point. While no semantics is ever set in stone, very compelling arguments would be needed to change the current behavior.

DNF · April 1, 2020, 11:58am

I will leave the other questions aside, but I have to say that the absence of proper vectors in Matlab is a huge annoyance. It’s causing me endless trouble almost every time I use Matlab (which is at least 5 days a week.) You really should learn to appreciate it.

The absence of vectors in Matlab means that in all code you have to make a decision and try to keep track of whether vectors are row or column vectors. Matlab pretends to be column-major, and functions like sum, prod, max, plot, etc. etc. etc. all operate along columns. But almost all built-in Matlab functions return row-vectors(!!!), linspace, 1:n, mynewvariable(n) = 5, etc. etc. etc.

One of the very worst: Matlab’s for loops can iterate over row vectors, but not over column vectors

Every time I write code I have to either check if an input is row or column, or enforce one or the other with x(:) or x(:).', and then I suddenly have to interface with code that prefers the opposite of what I have chosen (using row-vectors instead of column.)

Working in python or julia where there are proper 1-dimensional vectors is always a huge relief. It’s one of the worst things about Matlab.

(Yesterday, I was working with a class where most of the fields had column vectors, but one of them, for some ungodly reason, was a row vector. I thought, “let me use squeeze in my code, so I don’t have to special-case this field, to fix this”. Didn’t work, because squeeze removes dimensions of length 1 – except for row vectors which remain unchanged … In other words, squeezing a 1x3x4 array gives a 3x4, array, but a 1x3 array remains 1x3.)

(Postscript: Sorry about the heated post @kw_martin , but this is one of the things about Matlab that actually makes me genuinely angry just thinking about it.)

Oscar_Smith · April 1, 2020, 2:46pm

The reason scalar indexing drops dimensions is a little more subtle than I explained initially. The idea is that A[i,:][j]=A[i,j] is an identity that feels like it should hold. Since A[i,j] is a scalar, A[i,:] needs to be a type which when indexed by 1 index gives a scalar. The fact that Julia has ND arrays makes this harder to get around since matrices aren’t consider special-- they are just Array{T,2}.

The reason A[[i],:] needs to have the same type as A[[i,j],:] is for type stability. Both of these are calls to getindex(::Vector{Int}, ::COLON_TYPE), so if you want type stable code (which is critical for performance) both of these need to have the same type.

mbauman · April 1, 2020, 3:10pm

We could indeed use a bit more in the documentation on this rule — right now it’s very tersely described as:

X = A[I_1, I_2, ..., I_n]

[…]

X is an array with the same number of dimensions as the sum of the dimensionalities of all the indices.

[…]

If I_1 is changed to a two-dimensional matrix, then X becomes an n+1 -dimensional array of shape (size(I_1, 1), size(I_1, 2), length(I_2), ..., length(I_n)) . The matrix adds a dimension.

We sometimes call this behavior dim-sum (or APL, from which it was inspired) indexing. It’s a powerful technique that is fully general — that is, it doesn’t matter which index or how many indices there are. Matlab has some of these behaviors (e.g., indexing a 1-column by a matrix of indices produces a matrix the shape of the index, sometimes*, IIRC), but it only works when you’re using a single (linear) index.

* the sometimes exception is that indexing a column vector by a row vector preserves the column-ness of the original structure. I think. But it’s been a looong time since I’ve had a Matlab license.

mbauman · April 13, 2021, 1:33pm

A post was split to a new topic: Separate array elements without adding spaces

Topic		Replies	Views
Vector{Vector} indices General Usage indexing , arrays	22	2810	September 19, 2022
Problem: extracting a row from an array, returns a column General Usage question , arrays	13	8668	April 11, 2020
Matrix multiplication - inconsistent behaviour General Usage question	50	2905	May 16, 2020
Array{Float64,2} -> Array{Float64,1} General Usage indexing , array	5	2820	September 4, 2020
Why is the following behaviour not a bug? General Usage	19	1488	November 2, 2020

Trouble Understanding Slicing

Related topics