Trouble with sparse matrix transpose in 0.7 (and now 1.0 !?)


#1

In 0.7, it seems that the transpose of a sparse matrix has not only been made lazy, but it is providing a view of a matrix rather than a new matrix. This is going to cause a lot of confusion, and I think the measures needed to correct it will make code much uglier.

In 0.7, if we create a sparse matrix A and then type At = A', we no longer get a new matrix.
In fact, if we later change entries of A, this results in changes in the entries of At. This is rather counter-intuitive. Those of us who are used to working with sparse matrices often construct the transpose so that we can have quick access to the rows of the matrix (whereas the original provided quick access to the columns). We have lost that, too.

While there are situations in which this lazy behavior is useful, I strongly feel that it should not be the default. Those who want this should be able to type At = lazy_transpose(A).

This lazy behavior also feels inconsistent with the rest of Julia’s behavior. When I type A + I, Julia adds one to the diagonals of A. That is useful, and it makes it much easier to write fast code. Lazy evaluation makes it difficult for programmers to optimize code, because they can’t predict where the work is going to be done.

Steve Vavasis https://github.com/JuliaLang/julia/issues/28432 has pointed out other problems that this is causing.


#2

Another problem is that this convention creates inconsistencies in when Julia passes references or objects themselves. For example, it was decided that A[:,1] and A[1,:] should return new matrices (or vectors), not views into A. Users can obtain views when they want them. But, it is not the default. It is inconsistent to make this the default for transposes.


#3

I would say that in most cases you want (well, actually, I want) the lazy transpose, because it is used in cases like x'*A*x, so it makes sense that it is the default. For linear algebra it would be a real hassle to use something long-winded, like lazy_adjoint. Your use-case seems less “mathematical”, and should be more tolerant to verbose syntax.

If you need a copy, you can write

At = collect(A')  # or Matrix(A')

As for slices as copies vs views, that is a long discussion, and part of the decision was related to performance, not just semantics.

It is possible that if views were fast back in the early days, slices would have been views by default, but changing that now would be so apocalyptically breaking that I think that train has sailed.


#4

There is at least one open issue with regard to sparse matrix transpose:


There are several other issues open with regard to specialized linear algebra methods.


#5

But that’s just a problem that should be fixed, not a sufficient argument to changing the default behaviour, surely?


#6

With regard to whether the behavior of sparse transpose should be a view or copy, the behavior in 0.6 in this regard was internally inconsistent. For example, reshape in 0.6 gives a view, and yet from its type it appears to be a new matrix:

julia> A = randn(2,2)
2×2 Array{Float64,2}:
 -0.133056  -1.35982
  1.68023   -1.08701
julia> B = reshape(A,4);
julia> typeof(B)
Array{Float64,1}
julia> B[1]=9
9
julia> A
2×2 Array{Float64,2}:
 9.0      -1.35982
 1.68023  -1.08701

In 0.6, transpose copied while reshape provided a view. I think it’s better in 0.7/1.0 that at least everything is internally consistent. I personally would prefer that everything copies because of my long history using Matlab, but I suppose that ship has already sailed.


#7

No matter what behavior is chosen for sparse matrices and transposes, adjoints, and friends, some operations will be more efficient than others. In other words, it is not possible to make EVERYBODY happy. I think the issue is are the definitions of the methods internally consistent (it appears they are), and are the most common operations as efficient as they can be (some would argue that that is the case).


#8

Backtracking a bit on my previous post: It seems that these operations are still not totally consistent internally. All of them appear to provide a view, but you can’t always tell from the type that it is a view rather than a copy. So I would say that more clean-up work needs to be done. Here some examples from 1.0.0-rc1 in which the types assigned to the objects appear to be inconsistent.

julia> x = zeros(1);
julia> y = reshape(x,1,1);
julia> typeof(y)
Array{Float64,2}
julia> y[1]=2;
julia> x
1-element Array{Float64,1}:
 2.0
julia> z = x';
julia> typeof(z)
LinearAlgebra.Adjoint{Float64,Array{Float64,1}}
julia> z[1] = 3;
julia> x
1-element Array{Float64,1}:
 3.0
julia> w = reshape(x',1,1)
1×1 reshape(::LinearAlgebra.Adjoint{Float64,Array{Float64,1}}, 1, 1) with eltype Float64:
 3.0
julia> w[1] = 4
4
julia> x
1-element Array{Float64,1}:
 4.0
julia> y
1×1 Array{Float64,2}:
 4.0
julia> z
1×1 LinearAlgebra.Adjoint{Float64,Array{Float64,1}}:
 4.0

#9

Under my proposal, those who want to compute the quadratic form by typing x'*A*x still can. There is no need for them to type lazy_transpose. They only suffer in that an extra vector was allocated. But, the cost is small: Julia already has to compute A*x and allocate the memory for it. The extra cost of allocating x' is small. That is, even in 0.6 the time to compute x'*A*x is comparable to the time to compute A*x for sparse matrices A.

Those who are worried about the performance could use reshape, as in 0.6. But, the performance gain is small. Those who are really worried about the performance will write their own code to evaluate the quadratic form, and avoid all the allocations.


#10

I disagree, notational convenience should favour math-like operations. I also suspect this is the more common use-case. In your use-case you can achieve the wanted behaviour using collect, with zero performance loss.

In general, it is simpler to take a lazy operation and make it eager, than the other way around.

As for my example, it is trivial to come up with one where copying behaviour would have significant cost.


#11

I am not suggesting abandoning any standard mathematical notation. I am suggesting keeping things the way they were in version 0.6. This will make it much easier for less sophisticated users of Julia to avoid bugs. Most users would be shocked to type y = x'; y[1] += 1, and then find that x has changed.

I am merely suggesting that lazy evaluation and the use of a view should be an option, but not the default. While they will sometimes provide benefit to people who are trying to optimize their code, they create many unnecessary pitfalls for less sophisticated users. This is especially true if it is an inconsistent practice in the language.

At the very least, we need to acknowledge that this change in the behavior of the transpose will break a lot of people’s code. It certainly breaks a lot of my code for sparse numerical linear algebra and graph theory.


#12

In practice, you are. For a matrix-vector product with 1000x1000 matrix (A' * b), there is a 50x difference in performance for eager vs lazy transpose.

I think it is better to educate novice users rather than compromising the needs of advanced users.


#13

I agree that that sparse matrix transposes are annoying slow. I also agree that anyone who often writes A' * x would like a lazy matrix transpose. I also know that I have occasionally written the ugly (x' * A)' to deal with this.

But, I’m less convinced that anyone who worries about performance would find this a stumbling block. Someone who worries about performance should know that applying an operation like a transpose should take time. The people who I’ve encountered who do not understand this have much worse problems, like constructing sparse matrices by adding entries to them in a loop.

However, even reasonably good programmers will be thrown by bizarre errors created by unexpectedly encountering a view of a matrix. If Julia were filled with such behavior, they would learn about it from the documentation. But, it would take a very careful read to discover this issue. This really should be listed as a breaking change. It is not intuitive behavior.


#14

I wouldn’t be a stumbling block (for those people), but it would be an annoyance and make for ugly code that should be beautiful and simple, because this sort of notation very often is part of large, complex expressions.

It is reasonable to expect that most such operations on arrays return views (reshape, transpose, vec, Symmetric, etc. etc.) The fact that slices are not views is the surprise, and it is one that I have seen take many new users by surprise.

Oh, it absolutely is. You will find that lazy data structures are extremely common, and it’s something every new user has to get used to.