Proposal: drop relation SparseMatrixCSC <: AbstractArray

tim.holy · August 29, 2017, 8:09am

@Stephen_Vavasis, the point of ArrayIteration.jl is to be able to write a single generic multiplication algorithm and have it perform well for Dense*Dense, Dense*Sparse, Sparse*Dense, and Sparse*Sparse. The problem you describe is not limited to SparseMatrixCSC: we have other sparse matrix types too (UniformScaling, Diagonal, Bidiagonal, Tridiagonal, and many others in packages), and so supporting efficient implementations of every possible combination is mildly insane—using ordinary dispatch you’d have to implement N^2 methods for N types. It gets even worse when you consider various types of views (view, reshape, PermutedDimsArray, etc) of each of the different sparse types. The only hope is to turn an O(N^2) problem into an O(N) problem by defining iterators for each type and then writing algorithms that leverage these iterators.

The only downside is that ArrayIteration isn’t finished. Now that 0.7 has better inlining, the things that were blocking previously (heap allocation of immutables that contain references to arrays) may not be a problem anymore, so it may just take someone with a bit of spare time (currently, not me ) to finish it off.

I also think it’s far more important to have things work than it is to catch performance bugs, so I too disagree with the proposal here. @Stephen_Vavasis, in the short term is there any chance that you could catch your issues locally by doing the following?

julia> eval(Base, quote
       function getindex(A::SparseMatrixCSC{T}, i0::Integer, i1::Integer) where T
           error("using the generic fallback, fix the caller")
       end
       end)

Now any method that uses S[i, j] for a SparseMatrixCSC S will throw an error. You can define this locally until you’re confident that you’ve cornered every last performance problem.

tim.holy · August 29, 2017, 8:21am

I think one possible solution to that is a good instrumentation profiler. I am excited about GitHub - JuliaLabs/Cassette.jl: Overdub Your Julia Code which should make implementing such a profiler not too difficult.

I am also quite excited about Cassette, but I don’t think it will help here. Based on my experience with IProfile (which is admittedly very crude), that one minute of setup time could become 10 minutes. The sampling profiler is much easier to make “invisible” in terms of performance cost.

kristoffer.carlsson · August 29, 2017, 8:34am

Instrumentation profilers are very popular when profiling for example games and they can commonly profile in real time while still allowing the game to run at 60 fps (see Telemetry Performance Visualization System). Why do you assume that similar things could not be done for Julia code?

tim.holy · August 29, 2017, 8:56am

It would be great if this can be pulled off. I don’t know how those profilers work, but to me the bottom line is that if you add instrumentation—e.g., push!ing data to an array—even to fast, cheap functions like v[i] where v::Vector{T}, then there’s no hope of avoiding a performance hit. If those fast, cheap, inlineable functions dominated your runtime even without instrumentation, you could easily see a 10x hit with the instrumented version.

But if we can impose some kind of cutoff (e.g., avoid adding instrumentation to functions whose estimated runtime is less than 60 CPU cycles), then it might be more viable. EDIT: perhaps an easy option would be to add instrumentation only to :invoke expressions in CodeInfo objects.

JaredCrean2 · August 29, 2017, 6:29pm

I think the problem here is not that SparseMatrixCSC <: AbstractArray, its that AbstractArray is used far too much. If you consider

Dense matrices
Sparse matrices
Shared memory matrices
Distributed memory matrices
Non-stored matrices (for example, Ranges)

they have very little in common. Even get/setindex is not defined in all cases! For Ranges, setindex! isn’t defined at all, for distributed memory matrices, getindex may not be defined for the non-local part of the matrix. For shared memory matrices, setindex! might need locks, making it (potentially) very inefficient. The point I’m making is that this issue of fallbacks isn’t really specific to sparse matrices, its a problem with anything that isn’t a dense, single-process matrix. I think a lot of the AbstractArray interface should really be the DenseMatrix interface, and when there is commonality between DenseMatrix and one of the other matrix categories, they should present the same interface.

edit: thanks to @ChrisRackauckas for pointing out Ranges have getindex but not setindex!

ChrisRackauckas · August 29, 2017, 6:55pm

julia> (1:5)[2]
2

I assume you meant setindex!?

ChrisRackauckas · August 29, 2017, 6:59pm

I do think there’s something to be said here, but it really needs traits. Being dense is one trait of an array. There are a lot of algorithms which do at a very course level “work with AbstractArrays”, and the purpose of <:AbstractArray is more than just fallbacks. So <:AbstractArray still makes sense, but can be refined in multiple (orthogonal) ways via a set of traits. And there have already been some proposed traits for things like “can setindex!” (which is different than being mutable since an immutable AbstractArray type can hold a mutable array! See wrappers like GPUArrays). We just need an enlarged vocabulary to handle this appropriately.

anon94023334 · August 29, 2017, 8:16pm

+1 to the idea of using traits for this, but I think a prerequisite (and something we can do now) is to define a formal API for arrays: that is, what are the methods one should expect to be able to call on all subtypes of AbstractArray (or, going down one level, DenseArray / SparseArray)? That will give developers a sense of what they should make performant. I have implemented my own “Array-like” datatypes but still am not sure I’ve implemented all the methods that “arrayness” requires.

I go back to my own experience with this: there are some graph types that, no matter what you do, will never have good performance on some functions. In these cases we opted to keep the methods available, but warn the user that what they want to do is not performant. I think this is the best we can do.

ChrisRackauckas · August 29, 2017, 8:34pm

I think this is the right approach as well. GPUArrays.jl is doing this by allowing indexing in a GPUArray. Looking through a linear index on a GPUArray is a horrible idea. However, it allows you to do things like get the first value. In OrdinaryDiffEq.jl (the ODE solvers for DifferentialEquations.jl) we need this exactly once for the calculation of the initial dt, and the rest of the code all uses broadcast (with the appropriate solver algorithms). Thus in these cases using DifferentialEquations.jl with a GPUArray actually works, and it’s fine that first is used since it’s only used exactly once in what is normally a very long computation, whereas every other interaction with the GPUArray is broadcasted. If this fallback didn’t exist, we would still have some incompatibilities with GPUArrays even though the fallback doesn’t really have an effect on our actual runtime in any real-world case.

I think this is a good example of “allowing users to shoot themselves in the foot” is actually allowing users to do anything that’s remotely possible, and letting them decide the manner that is appropriate (and over time optimizing more and more ways of working). While I wouldn’t use these sparse matrix fallbacks in a tight loop, in a lot of cases they will work in a pinch and will get an implementation off the ground and working. Then in the future it can be specialized and optimized more, but maybe the “bad call” only happens once, in which case spending time to optimize it is most likely just a waste of time anyways.

The one thing I think can be made better is the communication that something is a bad fallback. I do wish we had a way of marking “only do this if necessary”. A warning is too much if it can’t be turned off. Requiring someone to profile it is a little too silent. There must be something in the middle.

anon94023334 · August 29, 2017, 8:38pm

We use warn(..., once=true). It’s not (yet?) silenceable.

mohamed82008 · August 29, 2017, 10:37pm

What about something like performant_types(func) to give the set of type tuples that func is performant for? This can then be recursively analyzed to figure out something like is_performant(foo, Array, SparseMatrixCSC). Analysis can be enforced by running in something like analysis mode whose macro can be removed once the user is happy or ok with being unhappy with the performance of the functions. And of course the word performant here is about as trustworthy as the person claiming it.

Stephen_Vavasis · August 30, 2017, 12:51am

To Mohamed82008 and ChrisRackauckas:

I don’t think that a solution involving warnings or a new function performant addresses my original concern. The fallback of sparse-matrix methods to dense-matrix loops is because of bugs in 0.6.0, not because the Julia developers didn’t get around to writing the methods I need. (The methods I need are present but aren’t getting called. The reason for the bugs is that the dispatch in abstractarray.jl and related base files is very complicated, so it’s easy to see how something could slip through the cracks.) It is unreasonable to expect the Julia developers to implement warnings for bugs.

However, there is a solution akin to warnings that would address my concerns. Consider, for example, repmat that appears in abstractarraymath.jl:

function repmat(a::AbstractVecOrMat, m::Int, n::Int=1)
    o, p = size(a,1), size(a,2)
    b = similar(a, o*m, p*n)
    for j=1:n
        d = (j-1)*p+1
        R = d:d+p-1
        for i=1:m
            c = (i-1)*o+1
            b[c:c+o-1, R] = a
        end
    end
    return b
end

I suggest that b in this function could be initialized to Array{eltype(a),2}(o*m,p*n) rather than using similar. In this way, if a sparse matrix routine accidentally falls back to the AbstractArray implementation, then a power-user will know right away because a dense matrix is returned. This approach has the benefit that no functionality is lost for the beginning user who just wants everything to work.

anon94023334 · August 30, 2017, 2:43am

I must have missed this (sorry): what methods specifically are converting to dense arrays? repmat doesn’t at any point, as far as I can see: similar() on a sparse matrix returns a sparse matrix.

If you’re saying that the indexing being performed in repmat is not performant in sparse matrices, then a new method that dispatches on a sparse matrix that utilizes sparse-performant indexing would solve this problem, no?

Stephen_Vavasis · August 30, 2017, 2:56am

The first bug occurred in a pre-release nightly master of 0.6.0 and is fixed in 0.6.0 release; it is described here:

https://github.com/JuliaLang/julia/issues/19503

The second bug is currently present in 0.6.0-release but has already been fixed in a PR. I don’t know whether the PR will go into 0.6.1 or 0.7. It is described here:

https://github.com/JuliaLang/julia/issues/23296

Stephen_Vavasis · August 30, 2017, 3:04am

I don’t know whether repmat is performant or not for sparse matrices-- I didn’t check. The point of my previous post is that the widespread use of similar to construct return arguments in abstractarray.jl and abstractarraymath.jl creates very fertile ground for bugs of the kind I am complaining about. I am complaining about bugs in the dispatch that cause functions to mistakenly and silently fall back to a dense routine. My latest proposal, namely, replacing all uses of similar with dense-arrays constructors would simultaneously keep functionality for beginning users while providing a strong clue to power-users that a non-performant operation is present in their codes.

mohamed82008 · August 30, 2017, 3:29am

I see your point. So if the problem is with the loops, I guess making more extensive use of ArrayIteration.jl in Base may cut it, which if I understand correctly would separate the semantics of iterating over an array or parts of an array from the detailed implementation that may be optimal for some array type and terrible for others. This “sparsity-aware programming” is something that probably needs to be more prevalent in Base and other AbstractArray based functions. I am not sure if that is addressing your issue here or not. I suppose there could be other temporary measures as well like the one you mentioned which just explicitly changes to a dense matrix before diving in a loop that assumes a dense matrix anyways. That would be like screaming: “I am not performant for sparse matrices.”

Palli · August 30, 2017, 2:51pm

Couldn’t you then redefine the type to subtype as is now done (or vice versa drop the subtype in your code)? You would need to make sure to only use when testing.

Counterpoint: Those who have your point of view would do it from the start in their code(?), but that seems bad if you use such a package, and it would ruin for everybody.

[I’m assuming type relationship is global; can’t be changed locally…]

JaredCrean2 · August 30, 2017, 2:55pm

Blockquote
I guess making more extensive use of ArrayIteration.jl in Base may cut it,

Being able to iterate over only the stored values in a sparse matrix may help some cases, but I don’t think its a general solution. For example, I don’t think its possible to implemented any kind of supernodal algorithm (for example, the supernodal Cholesky algorithm in SuiteSparse) using this approach.

Palli · August 30, 2017, 2:59pm

One issue unique to spare arrays (I mean SparseMatrixCSC: not e.g. Diagonal, UniformScaling etc.) is they are not fully IEEE compliant. There’s a Github issue on it; on -0.0 a special case for them:

github.com/JuliaSparse/SparseArrays.jl

Sparse linear algebra and structural zeros

opened 02:55AM - 10 Jul 17 UTC

closed 12:52AM - 17 Jan 22 UTC

andreasnoack

IEEE 754 has the unfortunate consequence that floating point numbers don't satis…fy `x*0 == 0`. That `x*0 == 0` holds is fundamental to the decoupling between the *symbolic* and *numerical* computations for sparse matrices which is arguably one of the most important optimizations for sparse matrix algorithms. Right now, Julia's sparse linear algebra code uses the sparsity pattern optimization extensively and is therefore not IEEE compliant, e.g. ```julia julia> eye(2)*[NaN,1.0] 2-element Array{Float64,1}: NaN NaN julia> speye(2)*[NaN,1.0] 2-element Array{Float64,1}: NaN 1.0 ``` Another example is ``` julia> (-eye(2))[1,2] -0.0 julia> (-speye(2))[1,2] 0.0 ``` Notice that IEEE compliance for the latter example (with the current sparse matrix implementation) would require storing `2*n^2 + n + 1` instead of `3n + 1` elements. This is also known as `MemoryError`. The main question in this issue is to choose one of the two options below for sparse matrices (and probably all types with structural zeros) 1. `x*0 == 0` for all `x` 2. `x*0 != 0` for some `x` aka IEEE 754 where `0` is a structural zero. I think that 1. is the best choice since I believe the symbolic/numerical decoupling is very important when working with sparse matrices and I'm not sure if 2. desirable because of the computational and memory costs or if it is possible to achieve at all. I.e. one thing is that it would be a lot of work to handle the `NaN` and `Inf` cases in all of our sparse code and that it is unclear who would deliver the hours to do so but what about sparse direct solvers? They optimize over the sparsity pattern without considering `NaN`s or `Inf`s and I'm not sure how we could handle that case. In theory, we could also choose 2. for some linear algebra functions and 1. for others but I think that would be more confusing and error prone. However, we'd need to figure out the implications of going with 1. Should we consider throwing more instead of creating `NaN`s and `Inf` during sparse calculation, e.g. division with zero? What about sparse broadcast which now (I believe) follows IEEE 754? Since `broadcast` allows all kinds of operators, things are much more complicated there. If sparse broadcast behaves differently from sparse linear algebra, we'd need to be careful when using broadcast for convenience in the linear algebra code (but I think that would be pretty simple to handle.) Finally, After discussing this issue in a couple of PRs, I'm pretty confident that we won't be able to reach a unanimous conclusion so when we think we understand the consequences, we'll probably need a vote. Otherwise, we'll keep discussing this everytime a PR touches one of the affected methods.

dlfivefifty · August 30, 2017, 6:40pm

Cant you just override the methods you don’t want to use to throw errors in your testing code?

Topic		Replies	Views
Tackling the AbstractArray fallback trap - wrapped sparse matrices Internals proposal , linearalgebra	5	1046	January 5, 2023
Using AI programming problem solvers to help develop Julia Linear Algebra ecosystem General Usage linearalgebra	2	189	May 31, 2025
Performant conversion of Symmetric(SparseMatrixCSC) to SparseMatrixCSC Internals & Design proposal	2	662	December 27, 2018
Unification of `SparseMatrixCSC` and `SparseVector` Internals & Design	4	490	January 26, 2019
Some thoughts on improving the doc of SparseArrays.jl General Usage sparsearrays	1	84	May 13, 2025

Proposal: drop relation SparseMatrixCSC <: AbstractArray

Related topics