Comprehensive intermediate-level Array tutorial

Can folks here recommend a tutorial on Arrays for somebody from a non-beginner programming background?

I looked at a couple of tutorials on YouTube but they were teaching very basic stuff like simple creating and access. I also looked at data science tutorials on Julia Academy but they are (rightly so) geared towards getting their audience quickly started on data science without getting into how Arrays work. Finally, I have been reading the manual but am having hard time with it. Currently I am stuck in the Indexing section trying to figure out why I have to specify all N dimensions in most cases, but for some reason when I am using boolean indexes, sometimes I can get away with less than N index entries! This is just one of the several things I am puzzling through as I read the manual.

Any pointers to a comprehensive intermediate/advanced level tutorial on Arrays is greatly appreciated!

Example?

From the section on logical indexing -

julia> x = reshape(1:12, 2, 3, 2)
2Ă—3Ă—2 reshape(::UnitRange{Int64}, 2, 3, 2) with eltype Int64:
[:, :, 1] =
 1  3  5
 2  4  6

[:, :, 2] =
 7   9  11
 8  10  12

julia> x[:, [true false; false true; true false]]
2Ă—3 Matrix{Int64}:
 1  5   9
 2  6  10

In the above example x is 3-dimensional, but the index has only 2 elements, : and a mask array.

: specifies 1 dimension, the boolean array specifies 2, that’s 3 dimensions total.

the boolean array specifies 2

This is the part I am struggling with. For example if I try x[:, [1 2; 2 1; 1 2]] I get a bounds error. Is there a more detailed breakdown that I can read to better understand how that 2 \times 2 boolean array is specifying 2 dimensions?

1 Like

I don’t know of one. The manual does already say that boolean arrays specify multiple dimensions like CartesianIndex vectors, which isn’t the case for non-boolean integer arrays said to specify only 1. It’s not obvious what you’re trying with that particular indexing, so I don’t know how to edit it.

Fair point that the manual does address this and other issues that I have been blocked on. Usually upon the second or third reading of the relevant sections I am able to unblock myself. This particular case was taking me longer. And tbh - your directing my attention to the relevant section and re-reading the section on CartesianIndex is clearing some of my confusion, so thank you for that :slightly_smiling_face:

I am just finding the manual a bit hard/dense to read, which brings me back to my original ask - an easy to follow tutorial-style exposition that goes beyond just the surface level.

Unconventional, but maybe doggo_dot_jl’s videos?

FWIW, I think that’s what the manual is trying to provide, but obviously missing the mark. I know this is unhelpful, but your viewpoint is valuable, so it might be worth you making a PR to try to fill in the gaps now that you understand i.

1 Like

I’ve failed to find one so far. Tutorials usually cover only a bit of multidimensional array indexing because most of it is somewhat complicated and not used often, even in languages that promote a “vectorized” API (not in the SIMD sense). Obviously, you’re looking for something more than iterating all elements or strided indexing with ranges (start:step:stop). I could explain what I know, but again, I use most of it very rarely in practice, and it’s more or less already in the documentation in a different order. I won’t bother distinguishing getting and setting or introduce other indexing contexts like lazy views here, but the specified elements should be the same.

An N-dimensional array, more precisely an object of AbstractArray{T,N} where {T,N}, can be indexed by a sequence of objects in bracket syntax A[i1, i2, ..., iN]. Each dimension has an axis, a valid range of indices. While an axis is specified to be a AbstractUnitRange object containing non-boolean Integer objects (unit range means a step of 1), array indexing must also support Int objects (Int32 on 32-bit platforms, Int64 on 64-bit platforms). First, we address indexing an array for a single element, though I expect you would mostly use the top 2.

  • When you use exactly N (non-boolean) integer indices, one for each dimension, you perform straightforward Cartesian indexing. A CartesianIndex{M} object can represent M adjacent Int indices.
  • When you use 1 integer index or an equivalent CartesianIndex{1} object, you perform linear indexing on a “flattened” 1-dimensional version of the multidimensional array. In many cases including Array, a multidimensional array is actually based on such a vector and would need to process all other indices to a linear index under the hood. eachindex, firstindex, and lastindex may provide linear indices if it’s more efficient than the Cartesian indices. Linear indexing takes precedence over the other 4 points in overlapping cases e.g. N == 1 in the first point.
  • You can use 0 indices, like A[], if the array has length 1 to access its only element, even if it’s not a 0-dimensional array. This mirrors the indexing of 1-element containers in general.
  • You can use <N indices if the unspecified trailing dimensions all have length 1; treat this as Julia filling in the rest of the N indices for Cartesian indexing. If the number of indices is 1, linear indexing takes precedence over this.
  • You can use >N indices if the extra indices, usually 1, specify imaginary trailing dimensions of length 1. This is useful for column vector indexing.*

Indexing an array can result in another array (possibly of a different concrete type) with the same element type T. So far, we discussed the special case where the linear index or each dimension is specified with an integer, a 0-dimensional object. That actually replaced the linear index or each dimension with 0 dimensions, in other words removing all the dimensions until we have a scalar element. Similarly, an AbstractArray of integer indices would replace the linear index or a dimension with its dimensions. For a trivial example of preserving the N dimensions when specifying 1 element:

julia> A = collect(reshape(1:6, 2, 3))
2Ă—3 Matrix{Int64}:
 1  3  5
 2  4  6

julia> A[2, 1] # each integer index "erased" a dimension
2

julia> A[[2], [1]] # each 1-vector replaced a dimension
1Ă—1 Matrix{Int64}:
 2

In more interesting situations, these arrays would select multiple integer indices per dimension, possibly repeated:

julia> A[[2;2;;], 1:2:3] # 2x1-matrix replaces 2, 2-vector replaces 3
2Ă—1Ă—2 Array{Int64, 3}:
[:, :, 1] =
 2
 2

[:, :, 2] =
 6
 6

Using arrays of CartesianIndex{M} objects would replace M dimensions at once:

julia> A[ [CartesianIndex(2,1) CartesianIndex(2,3)] ] # 1x2-matrix replaces 2x3
1Ă—2 Matrix{Int64}:
 2  6

When you want to preserve a linear index or a dimension as-is, : can be used instead of the exact linear indices or axis. A[:] is actually the simplest way to make a guaranteed flattened copy of A; the more common vec(A) shares mutable memory instead.

Finally, I’ll address logical indexing with boolean arrays again. It actually has the same purpose as a vector of CartesianIndex objects, but instead of the boolean array’s elements specifying the indices, the boolean array’s own indices at the true elements specify the indices. This is useful for filtering array elements or dimensions based on an unforeseen number of true conditions.

A lone boolean vector works like a lone vector of CartesianIndex{1} objects, which performs linear indexing and replaces the linear index with 1 dimension of the number of trues. The boolean vector must have the same length as the linear index:

julia> A[rand(Bool, 6)] # lone boolean 6-vector replaces linear index
3-element Vector{Int64}:
 1
 3
 4

An M-dimensional boolean array works like a vector of CartesianIndex{M} objects, which performs Cartesian indexing and replaces M dimensions with 1 dimension of the number of trues. The boolean array’s dimensions must match the dimensions it’s specifying:

julia> A[ iseven.(A) ] # boolean 2x3-matrix for 3 even elements replaces 2x3
3-element Vector{Int64}:
 2
 4
 6

julia> iseven.(A) # not Array{Bool}, but a bitpacked equivalent
2Ă—3 BitMatrix:
 0  0  0
 1  1  1

julia> iseven.(A) isa AbstractArray{Bool}
true

julia> A[:, sum.(eachcol(A)) .> 6] # boolean 3-vector for 2 columns where sum is >6
2Ă—2 Matrix{Int64}:
 3  5
 4  6

julia> A[any.(iseven, eachrow(A)), # boolean 2-vector for 1 row with even elements
         sum.(eachcol(A)) .> 6]
1Ă—2 Matrix{Int64}:
 4  6

*In matrices, the leftmost dimension specifies which row (and has the length of each column). Vectors only have 1 dimension, so they’re also treated as column vectors in the linear algebra sense (Rx1 matrices). The next dimension specifies which column (and has the length of each row). The following higher dimensions can be treated as further stacking these matrices in multidimensional arrays, though these don’t generally mean stacked matrices. Julia arrays are usually column-major, meaning elements are stored contiguously along the leftmost dimension; this matters because contiguous iteration is much faster than random access on modern hardware. Other languages tend to prefer row-major arrays, starting from the rightmost dimension with further dimensions to the left, so that might need some dimension permutation tricks or some other headaches to work efficiently in multidimensional arrays between languages. Fortunately, most people only need to deal with matrices and tables in files.

4 Likes

(Not unhelpful at all) Now, that I have figured a lot of this out, especially with the detailed explanation that @Benny has provided below, I’ll gladly submit a PR for this in a bit.

:folded_hands: Wow! Thank you for taking the time to write such a detailed explanation. I truly appreciate it. With your permission, I’d like to take parts of this explanation and fold it in the PR that I am writing for updating the manual. Would that be ok?

You have my permission, though I’d note that forum advice generally doesn’t need it. I made verbal shortcuts and references to other languages that wouldn’t be warranted in the manual, so you’ll have to make significant edits to the parts you’re adapting, as you seem to be suggesting anyway. Should be doable, I more or less reorganized the manual’s sections to begin with, so the manual is a pretty good starting point already.