Proliferation of matrix and vector types -- standard library going off the rails?

I spent most of the morning chasing down a performance regression in my code; it turns out that the following statement

y = spzeros(50000,50000)' * reshape(zeros(2,25000)',50000)

requires 16 seconds in 0.7.0-beta2 versus a tiny fraction of a second in 0.6.

A fundamental problem causing this and other severe regressions is the proliferation of matrix and vector types in stdlib versus the paucity of people spending time on all the glue methods. I already complained about this a year ago during the 0.5-to-0.6 transition and received responses that the problem would get better as more methods were written. But as far as I can tell (a quick glance at open issues confirms this), the problem is getting worse rather than better because the creation of new methods is outpaced by the explosion in the number of matrix/vector types. Ironically, the new matrix/vector types are being added to improve performance.

I don’t have solid suggestions how to fix this problem. Maybe someone with more authority than me can impose a moratorium on adding new matrix and vector types to stdlib? And maybe someone with more expertise than me can document how to avoid performance traps?

P.S. In case you are wondering, the type of reshape(zeros(1,1)',1) in 0.7.0-beta2 is:

Base.ReshapedArray{Float64,1,LinearAlgebra.Adjoint{Float64,Array{Float64,2}},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}
7 Likes

I noticed that you are hardly more than one copy away from doing things the v0.6 way:

y = spzeros(50000,50000)' * copy(reshape(zeros(2,25000)',50000))

Of course, the mental overhead of thinking about this is annoying so I hope a more convenient solution can be found.

I tried the following in 0.7.0-beta2: For the two functions (my original and mohamed82008’s variant):

q(y) = spzeros(2*y,2*y) * reshape(zeros(2,y)',2*y)
r(y) = spzeros(2*y,2*y) * copy(reshape(zeros(2,y)',2*y))

I ran @code_warntype. Indeed, the output shows that in the first case, generic matrix-vector multiplication is invoked, whereas in the second case, sparse matrix-vector multiplication is invoked.

This suggests that it is theoretically possible to write a tool to catch many of these performance traps. The tool would work by running @code_warntype on each user function as it is compiled. What would be needed to make such a tool possible?

  1. Someone would have to make a list of dense matrix-vector operations (like generic matvec mul) that in principle should be avoided for sparse or structured matrix types.

  2. It would have to be possible to hook into the compiler and allow user code to run (e.g., a diagnosis function that I write called find_performance_trap that invokes @code_warntype) each time a user’s function is compiled.