Why does `mul!(u, A, v)` allocate when `A` is sparse and `u, v` are views?

I’m wondering if there is a way to avoid allocations in the following scenario:

julia> using BenchmarkTools, LinearAlgebra, SparseArrays

julia> A = sprand(2, 3, 0.5);

julia> u, v = rand(2), rand(3);

julia> @btime mul!($u, $A, $v);
  21.139 ns (0 allocations: 0 bytes)

julia> u, v = view(rand(2, 3), :, 1), view(rand(2, 3), 1, :);

julia> @btime mul!($u, $A, $v);
  33.731 ns (1 allocation: 48 bytes)

If it helps, the size of the allocation does not increase with the size of the arrays

2 Likes

What version are you using? I ran your code on both 1.8.5 and 1.9.3 and I get zero allocations in both cases.*

On a different note: is the operation you are doing really “safe”? By updating u (the first column of B) you also change the values of v (first row of B). I guess in principle the result could depend on the order of how mul! calculates internally.

I get that it’s just a toy example and I’m not sure if this is related to the allocations you are seeing, but I thought it’s worth pointing it out.

*EDIT: Specifically I ran on those systems:

Julia Versions
julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  CPU: 8 × Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, icelake-client)
  Threads: 1 on 8 virtual cores
julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD Ryzen Threadripper PRO 3975WX 32-Cores
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 1 on 64 virtual cores
1 Like

I’m on 1.10.0-rc1, and I confirm that the allocations are not there on 1.9.3. Could this be a regression then?

Good catch, that was just a quick example but of course we want u and v independent. I updated the code, the allocations remain.

Julia Version 1.10.0-rc1
Commit 5aaa9485436 (2023-11-03 07:44 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores
2 Likes

Thanks for the update. Very interesting – I also get the single allocation with 48 bytes on Julia 1.10.

I’m definitely out of my depth when it comes to the internal changes between 1.9 and 1.10 (and I don’t have all the changes in my mind right now), but it looks like a regression to me :thinking:

julia> versioninfo()
Julia Version 1.10.0-rc1
Commit 5aaa9485436 (2023-11-03 07:44 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, icelake-client)
  Threads: 1 on 8 virtual cores

Would you open an issue on the Julia repo or on the SparseArrays.jl stdlib?

To me (an uninformed observer) it looks more like a problem of SparseArrays, so I would open an issue there.

I just ran the same with a normal array A = rand(2, 3) and get zero allocations on both 1.9 and 1.10.

Interestingly, the timings also look significantly different for me in the sparse case (also without the allocation, the normal mul! seems to have gotten slower – I am running both on the same machine):

# 1.9
julia> A = sprand(2, 3, 0.5);

julia> u, v = rand(2), rand(3);

julia> @btime mul!($u, $A, $v);
  12.518 ns (0 allocations: 0 bytes)

julia> u, v = view(rand(2, 3), :, 1), view(rand(2, 3), 1, :);

julia> @btime mul!($u, $A, $v);
  19.710 ns (0 allocations: 0 bytes)
# 1.10
julia> using BenchmarkTools, LinearAlgebra, SparseArrays

julia> A = sprand(2, 3, 0.5);

julia> u, v = rand(2), rand(3);

julia> @btime mul!($u, $A, $v);
  18.192 ns (0 allocations: 0 bytes)

julia> u, v = view(rand(2, 3), :, 1), view(rand(2, 3), 1, :);

julia> @btime mul!($u, $A, $v);
  27.388 ns (1 allocation: 48 bytes)

For normal arrays, it looks more consistent (even got a bit faster in 1.10):

# Julia 1.9
julia> A = rand(2, 3);

julia> u, v = rand(2), rand(3);

julia> @btime mul!($u, $A, $v);
  34.493 ns (0 allocations: 0 bytes)

julia> u, v = view(rand(2, 3), :, 1), view(rand(2, 3), 1, :);

julia> @btime mul!($u, $A, $v);
  43.833 ns (0 allocations: 0 bytes
# Julia 1.10
julia> A = rand(2, 3);

julia> u, v = rand(2), rand(3);

julia> @btime mul!($u, $A, $v);
  33.168 ns (0 allocations: 0 bytes)

julia> u, v = view(rand(2, 3), :, 1), view(rand(2, 3), 1, :);

julia> @btime mul!($u, $A, $v);
  39.085 ns (0 allocations: 0 bytes)

Good catch, the performance hit is significant indeed!

2 Likes

As a quick fly-by shot-in-the-dark: 48 bytes smells like a type instability somewhere — both SparseMatrix and 2-index SubArrays have 5 8-byte fields (+ 8 byte type tag) == 48 bytes.

1 Like

If you go down a few steps in the call chain you have

julia> @btime SparseArrays.spdensemul!($u, 'N', 'N', $A, $v, LinearAlgebra.MulAddMul(true, false));
  15.212 ns (1 allocation: 48 bytes)

and then

julia> @btime SparseArrays._spmatmul!($u, $A, $v, true, false);
  9.602 ns (0 allocations: 0 bytes)

There are a few things that have been shaved off between those calls but it seems like the culprit is

julia> @btime LinearAlgebra.wrap($v, 'N');
  5.026 ns (1 allocation: 48 bytes)

That function is inherently type unstable but presumably that’s intended to be optimized away by constant propagation. I guess it didn’t quite succeed. In master it’s annotated with

Base.@constprop :aggressive function wrap(A::AbstractVecOrMat, tA::AbstractChar)

which looks like it’s addressing this problem.

Update: The annotation was introduced in Aggressive constprop in LinearAlgebra.wrap by jishnub · Pull Request #51582 · JuliaLang/julia · GitHub and seems to be included in 1.10.0-rc1, so maybe not effective enough.

3 Likes

I think spdensemul!generic_matvecmul! also needs to be annotated with @constprop :aggressive. This may happen in Aggressive constprop in matvecmul and matmatmul by jishnub · Pull Request #51961 · JuliaLang/julia · GitHub

Good that I now have a motivating example to push that PR forward :slight_smile:

5 Likes