Vector{Vector} indices

This is a trivial issue, but it puzzles me, and makes me think I don’t understand Julia arrays.
With

julia> test = [[1,2],[3,4]]
2-element Vector{Vector{Int64}}:
 [1, 2]
 [3, 4]

then

julia> test[1]
2-element Vector{Int64}:
 1
 2

julia> test[1,:]
1-element Vector{Vector{Int64}}:
 [1, 2]

julia> test[1][1]
1

which all makes sense. But then I would expect test[1,1] to either return 1 or throw an error. However,

julia> test[1,1]
2-element Vector{Int64}:
 1
 2

i.e. the same as test[1]. Why is this?

Indexing ignores trailing 1s.

julia> [10,20][2,1,1,1,1]
20

I think I saw a conversation about removing this in 2.0 but I can’t find it atm.

This behaviour is described here: Multi-dimensional Arrays · The Julia Language

In your example you add extra indices and that works if and only if the extra indices are all 1.
So, essentially test[1,1] is the same as test[1].
The behaviour is there to allow a column to be used like a matrix, which can be useful to write generic algorithms.

1 Like

OK, thanks. So is this bound up with the decision to follow Matlab and Perl in identifying rank 0 and 1 tensors, e.g.

julia> [[1 2] 3]
1×3 Matrix{Int64}:
 1  2  3

Not sure that was a wise design choice.

I think it would be more useful to interpret test[x,y] as test[x][y], but that’s probably just me.

On a related note, one thing I’ve never gotten used to is the opaqueness of array type conversions. For example, there should be a simple, standard way to convert Vector{Vector} to Matrix, but the methods I’ve seen suggested seem ad hoc.

It’s not just you, I’ve seen this comment a few times around here.

julia> stack([[10,20], [30,40]])
2×2 Matrix{Int64}:
 10  30
 20  40

and TensorCast.jl has some good stuff too.

Indexing, slicing and the like has been discussed in some depth already. Here just a few remarks:

  • Interpreting test[i, j] as test[i][j] suggests that matrices/2-dimensional array acts like vectors of vectors. Guess this is true in numpy and test[i] slices the first row of a matrix. In Julia, the two types are considered as different though and test[i] also works, but uses linear indexing in case of higher-dimensional arrays:
julia> test = reshape(1:6, 2, 3)
2×3 reshape(::UnitRange{Int64}, 2, 3) with eltype Int64:
 1  3  5
 2  4  6

julia> test[2]
2
  • Similarly, reshaping a vector of vectors has to be done explicitly and also only works if all rows have the same length and Python confusingly overloads syntax here, i.e., np.array([[1, 2, 3], [4, 5, 6]]) returns a matrix whereas np.array([[1, 2], [4, 5, 6]]) gives a vector of vectors. Imho Julia has a nicer syntax here distinguishing between a matrix [1 2 3; 4 5 6] and a vector of vectors explicitly [[1, 2], [4, 5, 6]].

jar1: Where is stack?

julia> stack([[10,20], [30,40]])
ERROR: UndefVarError: stack not defined

TensorCast looks great, but the fact that a whole package of macros is required seems indicative of a deeper problem.

Like numpy ? math.h ?

(Personally, not being a heavy user of this kind of operation, I find most syntax options weird, and mostly possible source of bugs. For instance, [[1 2] 2] for me should return a Matrix of one line, with a vector and a integer. But I understand that the utility of such a notation is more important here for stacking stuff).

bertschi: Thanks for your response, but I’m not persuaded. A Matrix and a Vector{Vector} may be distinct types, but that doesn’t explain why the test[i,j] notation can’t be supported for both.

So reshape produces its own type? That’s disgusting.

Why can’t I easily recast a Vector{Vector} as a matrix?

julia> test = [[1,2],[3,4]]
2-element Vector{Vector{Int64}}:
 [1, 2]
 [3, 4]

julia> Matrix(test)
ERROR: MethodError: no method matching (Matrix{T} where T)(::Vector{Vector{Int64}})

I get that this should only work if all the inner vectors have the same length, but then why does this work

julia> reduce(hcat,test)
2×2 Matrix{Int64}:
 1  3
 2  4

where the same objection applies?

I think the numpy overloading problem is irrelevant here, because in Julia we could use
Matrix([[1, 2, 3], [4, 5, 6]])
to clarify.

I’m not familiar with numpy - I know Matlab and Mathematica pretty well. Could you explain what you mean?

Just that having huge parts of the functionality on libraries is not necessarily a language problem.

Not just a library, a macro library. Functions are too weak to deal with the problem?

I think that depends on how much the library wants to introduce a new syntax. For that specific operation a function is perfectly fine.

As a side note, the thing that makes Julia special relative to some other high-level languages, is that you can sometimes solve a problem just writing down your own function, which can be very efficient, and that may be quicker than even discovering if there is something ready to use. In this case, for example, you can use:

julia> function stack(M)
           n, m = length(M[begin]), length(M)
           all(length(v) == n for v in M) || error("all vecs must be of same length")
           return [ M[j][i] for i in 1:n, j in 1:m ]
       end
stack (generic function with 1 method)

julia> stack([[1,2],[3,4]])
2×2 Matrix{Int64}:
 1  3
 2  4

You can do, without new implementations:

julia> reduce(hcat, M)
2×2 Matrix{Int64}:
 1  3
 2  4

(and, - and this is good! - the above function and this option are equally performant, just by adding an @inbounds to the M[j][i] in the stack function):

julia> M = [ rand(10) for _ in 1:10 ];

julia> @btime reduce($hcat, $M);
  104.482 ns (1 allocation: 896 bytes)

julia> function stack(M)
           n, m = length(M[begin]), length(M)
           all(length(v) == n for v in M) || error("all vecs must be of same length")
           return [ @inbounds(M[j][i]) for i in 1:n, j in 1:m ]
       end
stack (generic function with 1 method)

julia> @btime reduce($hcat, $M);
  103.343 ns (1 allocation: 896 bytes)

edit: For why you can’t do Matrix([[1,2],[3,4]]), my guess is that because this is not a trivial operation, it is not simply a data reintpretation. The input is a vector of vectors, which is a vector of pointers that point to the input vectors. The output has a packed memory layout, thus a completely different object, that has to be created somewhere else in the memory. Performance-wise one should try to avoid this type of conversion, which on the other side is very common in workflows in other languages, and perhaps useful in data-mangling in general (not mentioning the confusion caused by the fact that the vector-of-vectors notation may be just a matrix in numpy). Having to use reduce(hcat, v) is not bad, from an didactic point of view, clearly indicating what one is doing there.

I think it is highly ambiguous what this should mean. My first instinct on seeing Matrix(vec) would be that it should turn a length-N vector into an Nx1 matrix, not that it should dig into the underlying elements and extract them into the matrix.

Perhaps you hold an implicit assumption that the interpretation of your example is obvious because matrices are obviously 2D arrays of scalars, but in Julia you can very well have arrays of dictionaries of tuples of user types, etc. So there is nothing obvious about how to extract the inner data when calling Matrix on a vector.

This is unlike reduce(hcat, ...) which is quite explicit.

2 Likes

Julia does not follow Matlab in that design choice:

julia> 42 == [42]
false

julia> typeof(42)
Int64

julia> typeof([42])
Vector{Int64} (alias for Array{Int64, 1})

The behavior that you observed is the behavior of hcat. A space in an array literal represents a horizontal concatenation, so the following are equivalent:

julia> [[1 2] 3]
1×3 Matrix{Int64}:
 1  2  3

julia> hcat([1 2], 3)
1×3 Matrix{Int64}:
 1  2  3

That’s a good point. What about this?
Matrix{Int}(test)
Does that disambiguate well enough?

Also, Julia currently does neither:

julia> Matrix([1,2,3])
ERROR: MethodError: no method matching (Matrix{T} where T)(::Vector{Int64})

Why can’t a multiple dispatch language do a sensible thing on Vector{Int}?

So

[[1 2] 3]

is just syntactic sugar for

hcat([1 2], 3)

?
What other syntactic sugar might I be invoking when using []?

The rules have gotten a bit more complicated than the last time I looked, but you can find them here:

https://docs.julialang.org/en/v1/manual/arrays/#man-array-concatenation

The syntax sugar is pretty handy. You can write a matrix like this:

julia> [1 2
        3 4]
2×2 Matrix{Int64}:
 1  2
 3  4

…because a newline character represents vertical concatenation. (So does a single semicolon.)

Without looking to hard at it, I think it’s not clear enough how it should determine output shape in the general case.
Maybe it should just unravel everything into an Nx1 matrix?

Yes, I think this is just too ambiguous. I just mentioned it as an alternative that would make as much sense, while still not being obviously ‘the right way’.

Was not trying to persuade anyone, just wanted to explain that Julia has chosen differently here. All languages, numpy, Matlab and Julia, have their quirks and often taken different decisions regarding syntax and semantics of multi-dimensional arrays. Overall, I feel that Julia is rather consistent, yet often different than numpy or Matlab.

Indeed, the objection applies and it raises an error in that case:

julia> reduce(hcat, [[1, 2, 3], [4, 5]])
ERROR: ArgumentError: number of rows of each array must match (got [3, 2])

That reshape has its own type is merely an implementation detail and its type acts like an (Abstract)Vector in all respects. That the types Vector{Vector} and Matrix are different is not an implementation detail, but a conscious decision in the design of the language - like it or not.

This could be defined, but as others have noted it might be ambiguous. Also keep in mind that Julia is column-major. This has several implications:

  1. Slicing rows is inefficient as they are not represented in memory consecutively, which might offer another reason why test[i] does not extract the ith row, but indexes linearly.

  2. reduce(hcat, [[1, 2, 3], [4, 5, 6]]) gives a 3 x 2 matrix, i.e., collects the inner vectors into the rows. Is that what you would want for Matrix([[1, 2, 3], [4, 5, 6]])?