Proposal: drop relation SparseMatrixCSC <: AbstractArray

Please forgive me for taking a (perhaps excessively) philosophical view on this, but my understanding of the design philosophy of Julia was that abstract types should be isomorphic to abstract mathematical definitions wherever possible. In that viewpoint, matrices are objects that transform under rank-2 tensor representations whether they are dense, sparse, shared, distributed or generative. I am not insensitive to the issues involved, but in my view this structure should only be abandoned after much careful and deliberate consideration, and only if it proves absolutely necessary in a significant number of cases. Even then, it would be a crying shame. One of the major advantages that Julia has over its competitors is a sound parity between concrete implementations and the abstract entities they are intended to represent. This feature makes it vastly easier to reason about code, as in this case one can reason about code in the same way one can reason about the underlying math in many cases. Additionally, in the short-run it would require a non-trivial redesign of a good many packages.

Furthermore, even once we do have traits demoting matrices to traits would seem like a significant loss. From a purely mathematical perspective it is difficult to justify why Float64 <: Real but !(SparseMatrixCSC <: AbstractArray).

6 Likes

I have abandoned my original proposal of dropping the subtype relation SparseMatrixCSC <: AbstractArray primarily because sbromberger and others pointed out that my proposal would break too much existing code.

My modified proposal, described in an earlier post in this thread, is that functions in Base that pertain AbstractArray (mostly in abstractarray.jl and abstractarraymath.jl) that currently use similar to construct return arguments should instead construct dense matrices (using the Array{T,2} constructor). This would cause minimal code breakage while mostly addressing my concern.

Indeed, returning dense arrays may actually be more “philosophically” correct than using similar. For example, if one creates a tridiagonal matrix type and then wishes to use the repmat function on a tridiagonal matrix t, it is clear mathematically that repmat(t,2,2) will not be a tridiagonal matrix. So the fact that the fall-back AbstractArray version of repmat uses similar to construct its return argument seems mathematically dubious.

2 Likes

@JaredCrean2 but I believe this is an algorithm for sparse matrices only anyways, so I doubt people would use it for dense matrices, which makes it a different issue. And generally speaking, it shouldn’t be too hard to implement any algorithm with some fixed array iteration semantics which can be added to ArrayIteration.jl, if they are not already there.

My gut feeling is that this is one of those cases where there are a lot of little annoying issues that crop up for a while, but which come to an end at some point and fade into distant memory. There are a large but finite number of array operations exposed by Base which need efficient sparse implementations. Once we’ve got all of those, this will cease to be an issue.

There may, however, be some merit to having a AbstractNonSparseArray <: AbstractArray type and writing fallbacks that touch every element only to that abstract type (and having SparseMatrixCSC not inherit from it). That would allow writing generic code for AbstractArray objects that include sparse and dense array while still ensuring that sparse arrays don’t get impractically slow fallbacks. The hardest part of this change would be going through all the methods that dispatch on AbstractArray and deciding if they’re really non-sparse-specific or not.

2 Likes

what about distributed and GPU arrays? they cant really accept normal iterative code either, right?

Are there abstractions that will apply to all of these?

Isn’t that what DenseArray is?

Maybe, although it’s unclear to me that the only alternatives are dense and sparse. If that’s really the case, then we should just restrict generic array algorithms that touch every element to the DenseArray type, problem solved.

1 Like

No idea – maybe some GPU array folks can chime in. @MikeInnes, @maleadt, @sdanisch?

1 Like

distributed and DB/ out of core arrays aren’t necessarily sparse, nor should they be subtypes of dense, right?

The only thing I know of is the stencil form implemented by e.g. ArrayMeta.jl and Tokamak. I’m pretty excited about that as I think it can apply not just to CPU and GPU arrays but also to distributed computing, possibly even things like JuliaDB to some extent. On the flip side, that’s very much a research project that isn’t going to give an immediate result – the space is pretty unexplored.

Yes, distributed arrays can also benefit from “customized” iteration—in that case you might want to iterate in some sort of “cache efficient” order, where in this case cache=node. That’s all TiledIteration does, and it’s used by ArrayMeta. I’m not saying that’s all you need, but it’s a big step in the right direction.

I agree with @ChrisRackauckas that this is a good use-case for traits, after all there is already the Base.IndexStyle trait, so expanding this to better describe array internals and dispatch on that would probably allow for a nice design with high performance and a simple type hierarchy.

Agreed that this would be a good case for traits. That’s what makes Scala an interesting language. But beware that traits also bring a lot of extra complexity and you should be careful to not factor data structures too finely. Just because you can doesn’t mean you should :slight_smile: See the Scala collections and their complexity. That’s why in Scala 2.13 some of the collections are being redesigned (for the third time).

I don’t have a lot to add, but I think people are picking up on the same set of things I picked up on in writing StaticArrays:

  • What operations are performant depends on the implementation (concrete type). However, much of the generic code written in Base is most suitable for large-ish, mutable, non-distributed, dense arrays.
  • The existence of some AbstractArray interfaces like setindex! can’t currently be discovered through the type system. (!!)
  • similar as it exists doesn’t always do quite what you want. It also never works for immutable arrays that need have values populated upon construction.
  • Some properties like DenseArray are better expressed as traits (though I agree we should limit the quantity of these to avoid over-complication).

However, I wouldn’t look at this and decide to throw the baby out with the bath water (by in any way reducing the genericism currently available). I find the genericism of the system is actually quite impressive! We can iterate on available traits (and the type tree), iteration protocols and array construction (e.g. how to improve similar or alternative approaches) - and arrive at something that generalizes better. It will take time, it won’t be all finished by v1.0, but I believe that it can and will improve.

8 Likes