Julia type displaying change between versions

This is the opposite of my point. An array in julia is not a vector of vectors. These are different concepts; they are different in memory layout, implementation, which operations they support, and also from a linear algebra perspective. They are similar, and therefore a Vector is a subtype of Array, but they are not identical.

And what if they don’t? After all, each element in a list can be totally distinct in length. Is it really fair to say, at a type level, that an object that has direct correspondence with math and tensors is the same as all lists of lists? If you do this, everywhere you use a Matrix you must check it is valid before e.g. accessing indices. No thanks.

The only point I agree with you on in this thread is that numpy isn’t easy to use in this regard:

In [57]: x = np.array([[1,2], [1,2]])

In [58]: y = np.array([[1,2], [1,2,3]])

In [59]: x.ndim
Out[59]: 2

In [60]: y.ndim
Out[60]: 1

In [61]: x[0,0]
Out[61]: 1

In [62]: y[0,0]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-62-74d735007e1b> in <module>()
----> 1 y[0,0]

IndexError: too many indices for array

It is confusing to me that the above constructor makes two completely different objects in the two cases. x is a Matrix, in the julia sense, while y is a 1-d np.ndarray containing two regular python lists. In my opinion (coming from julia), whether I want a list of lists in the first case, or whether I am asking for a matrix in the second (and an error should be thrown) is something that should be more explicit, and after the construction, I should be certain what I have created.

It’s worth noting that your original post in this thread was based on a misunderstanding of what changed between julia versions. It seems that you believed that in past versions, matrices were stored as a list of lists, because the printout said Array{Float64, 2}, and the word array shows up in other languages as well. As I hope is clear by now: this was never the case. Julia has always distinguished between these concepts, and that distinction is fundamental to its usefulness in mathematical programming. The only thing that changed is the word Matrix shows up in the printout now, to help users quickly understand what the object is (and always was).

I’m not sure how much exposure you have to the language so far, but I think that greater immersion in the language, including reading documentation and previous discussions, here and on github, will demonstrate that the core decisions behind julia are all perfectly sound, to say the least.

15 Likes

Numpy’s y makes perfect sense.

But I admit it can be annoying to debug (I have spent more time doing it than I want to admit) especially if you accumulate sub-arrays in a list only for later to find out when you are about to convert it into an array that some elements don’t share a common dimension.

But then in real life outside of universities not everything is laid out perfectly especially when dealing with real data.

I find this response ridiculous, to be honest.
This has nothing to do with universities or things being perfectly laid out. No one has ever had to debug this in Julia, because it simply can’t happen, end of story.

5 Likes

On the comment about numpy (when you pass it a ragged list of lists) IIIRC that behaviour is going to be deprecated and will raise an error in the future.

7 Likes

Part of the confusion seems to originate from Numpy:

>>> from numpy import *
>>> a = matrix('1 2; 3 4')
>>> a
matrix([[1, 2],
        [3, 4]])

In Numpy a matrix is a column vector of row vectors, at least it is printed like this. Let’s try a little “linear algebra”, multiplication of matrix and vector:

>>> b=array([5, 6])
>>> b.shape = (2,1) # Turn it to a column vector (rather vector of vectors?)
>>> b
array([[5],
       [6]])
>>> a*b  # Matrix times column vector
matrix([[17],
        [39]])
>>> c=array([7, 8]) # row vector times matrix?
>>> c*a                 
matrix([[31, 46]])

A bit strange, I would say, but perhaps it “kind of works”?

In Julia:

julia> a=[1 2;
          3 4]
2×2 Matrix{Int64}:
 1  2
 3  4
julia> b=[5, 6]
2-element Vector{Int64}:
 5
 6

julia> a*b  # Matric times column vector:
2-element Vector{Int64}:
 17
 39

julia> c=[7 8]
1×2 Matrix{Int64}:
 7  8

julia> c*a
1×2 Matrix{Int64}:
 31  46

Relatively smooth “linear algebra” I would say?

3 Likes

I am not sure but I would probably use:

c=[7, 8]'

because differentiating between ‘,’ an no ‘,’ can become quite error prone in real code.

But in any case the above numpy code is a perfect example why numpy is a rats nest.

I think this is a really nice example on the consistency that Julia provides here. One could also do

julia> c = [7,8]'
1×2 adjoint(::Vector{Int64}) with eltype Int64:
 7  8

to be more precise that c is a row vector (you get the same type out after multiplication then).

I like that Julia is more strict here in distinguishing matrices (Array{T,2}) or in other words – 2D rectangular data and not vectors of vectors (where the single vectors could be of different length).

Oh Schneeschaufel just wrote the same idea :slight_smile: but concerning numpy I am also quite biased, since I prefer the exactness Julia uses here.

Anything actionable here? Seems like not. Printing of some types changed as part of a general overhaul to make printing of complex parametric types simpler and cleaner. Actual type information didn’t change, just printing of aliased parametric types. This can break some doctests, which is annoying, but doesn’t affect the behavior of code that isn’t reflecting on and relying on specific printing of types.

There also seems to be a long discussion about whether matrices are vectors of vectors. They aren’t, of course, but there’s some discussion about whether it’s a good idea to try to treat them as if they are to some extent. Numpy is forced to do this somewhat because they need to go back and forth between plain Python lists and real multidimensional arrays in numpy. This correspondence is required since they use Python lists as the literal syntax for numpy arrays and Python doesn’t have multidimensional lists, so the only option for using lists to represent multidimensional arrays is to use a lists of lists. Of course, Numpy doesn’t go all the way, which we can see from this example since x and y have very different APIs. If Numpy was truly committed to treating matrices and vectors of vectors the same way then y[0,0] would be a valid way to index into a vector of vectors and x[0][0] would be a valid way to index into a matrix. I’ll note that I also have my usual strong distaste for the fact that what the code means depends on a dynamic runtime property—whether all the vectors in your vector of vectors happen to have the same length or not. This is a classic footgun where code goes :boom: in some special case you probably didn’t expect. How do you force Numpy to actually give you a vector of vectors when that’s what you want, even if they all happen to have the same length?

Julia is in a quite different situation since the standard array type is multidimensional and the language has direct syntax for n-dimensional array literals. So there’s no reason to use nested arrays as a bad proxy for real multidimensional arrays. It’s interesting that a limitation of Python has somehow become interpreted from one point of view as a virtue. It’s all about what you’re used to, I suppose.

25 Likes

One needs to draw a line in the sand and decide what things are part of your API promise for semantic versioning and what are not.

The ColPrac contributors guide formalizes this for many julia projects.
While not officially for julia itself, the list of what is and is not part of the API promise was informed by discussions with core julia developers of what they considered breaking (as well as the equivalent document for the TensorFlow project).

Everything on this list can, in theory, break users’ code.
See XKCD#1172.
However, we consider changes to these things to be non-breaking from the perspective of package versioning.

  • Changes to the string representation: The output of print/string or show/repr on a type may change at any time.
    Users should not depend on the exact text, but rather on the meaning of the text.
    Changing the string representation often breaks downstream packages tests, because it is hard to write test-cases that depend only on meaning
15 Likes

4 posts were split to a new topic: Numpy’s list of lists vs. Julia’s linear indexing