Why does `mul!(u, A, v)` allocate when `A` is sparse and `u, v` are views?

gdalle · November 9, 2023, 2:53pm

I’m wondering if there is a way to avoid allocations in the following scenario:

julia> using BenchmarkTools, LinearAlgebra, SparseArrays

julia> A = sprand(2, 3, 0.5);

julia> u, v = rand(2), rand(3);

julia> @btime mul!($u, $A, $v);
  21.139 ns (0 allocations: 0 bytes)

julia> u, v = view(rand(2, 3), :, 1), view(rand(2, 3), 1, :);

julia> @btime mul!($u, $A, $v);
  33.731 ns (1 allocation: 48 bytes)

If it helps, the size of the allocation does not increase with the size of the arrays

Sevi · November 9, 2023, 3:22pm

What version are you using? I ran your code on both 1.8.5 and 1.9.3 and I get zero allocations in both cases.*

On a different note: is the operation you are doing really “safe”? By updating u (the first column of B) you also change the values of v (first row of B). I guess in principle the result could depend on the order of how mul! calculates internally.

I get that it’s just a toy example and I’m not sure if this is related to the allocations you are seeing, but I thought it’s worth pointing it out.

*EDIT: Specifically I ran on those systems:

Julia Versions

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  CPU: 8 × Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, icelake-client)
  Threads: 1 on 8 virtual cores

julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD Ryzen Threadripper PRO 3975WX 32-Cores
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 1 on 64 virtual cores

gdalle · November 9, 2023, 3:32pm

I’m on 1.10.0-rc1, and I confirm that the allocations are not there on 1.9.3. Could this be a regression then?

Good catch, that was just a quick example but of course we want u and v independent. I updated the code, the allocations remain.

Julia Version 1.10.0-rc1
Commit 5aaa9485436 (2023-11-03 07:44 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores

Sevi · November 9, 2023, 3:40pm

Thanks for the update. Very interesting – I also get the single allocation with 48 bytes on Julia 1.10.

I’m definitely out of my depth when it comes to the internal changes between 1.9 and 1.10 (and I don’t have all the changes in my mind right now), but it looks like a regression to me

julia> versioninfo()
Julia Version 1.10.0-rc1
Commit 5aaa9485436 (2023-11-03 07:44 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, icelake-client)
  Threads: 1 on 8 virtual cores

gdalle · November 9, 2023, 3:42pm

Would you open an issue on the Julia repo or on the SparseArrays.jl stdlib?

Sevi · November 9, 2023, 3:53pm

To me (an uninformed observer) it looks more like a problem of SparseArrays, so I would open an issue there.

I just ran the same with a normal array A = rand(2, 3) and get zero allocations on both 1.9 and 1.10.

Interestingly, the timings also look significantly different for me in the sparse case (also without the allocation, the normal mul! seems to have gotten slower – I am running both on the same machine):

# 1.9
julia> A = sprand(2, 3, 0.5);

julia> u, v = rand(2), rand(3);

julia> @btime mul!($u, $A, $v);
  12.518 ns (0 allocations: 0 bytes)

julia> u, v = view(rand(2, 3), :, 1), view(rand(2, 3), 1, :);

julia> @btime mul!($u, $A, $v);
  19.710 ns (0 allocations: 0 bytes)

# 1.10
julia> using BenchmarkTools, LinearAlgebra, SparseArrays

julia> A = sprand(2, 3, 0.5);

julia> u, v = rand(2), rand(3);

julia> @btime mul!($u, $A, $v);
  18.192 ns (0 allocations: 0 bytes)

julia> u, v = view(rand(2, 3), :, 1), view(rand(2, 3), 1, :);

julia> @btime mul!($u, $A, $v);
  27.388 ns (1 allocation: 48 bytes)

For normal arrays, it looks more consistent (even got a bit faster in 1.10):

# Julia 1.9
julia> A = rand(2, 3);

julia> u, v = rand(2), rand(3);

julia> @btime mul!($u, $A, $v);
  34.493 ns (0 allocations: 0 bytes)

julia> u, v = view(rand(2, 3), :, 1), view(rand(2, 3), 1, :);

julia> @btime mul!($u, $A, $v);
  43.833 ns (0 allocations: 0 bytes

# Julia 1.10
julia> A = rand(2, 3);

julia> u, v = rand(2), rand(3);

julia> @btime mul!($u, $A, $v);
  33.168 ns (0 allocations: 0 bytes)

julia> u, v = view(rand(2, 3), :, 1), view(rand(2, 3), 1, :);

julia> @btime mul!($u, $A, $v);
  39.085 ns (0 allocations: 0 bytes)

gdalle · November 9, 2023, 6:33pm

Good catch, the performance hit is significant indeed!

github.com/JuliaSparse/SparseArrays.jl

Regression for `mul!` from 1.9 to 1.10

opened 06:33PM - 09 Nov 23 UTC

gdalle

This is about in-place multiplication `mul!(b, A, x)` of a sparse matrix `A` by …a vector `x`. From 1.9.3 to 1.10.0-rc1, this operation - has gotten significantly slower (north of 50%) - has started allocating when `x` is a view **MWE** ```julia using BenchmarkTools, LinearAlgebra, SparseArrays function testmul(n) A = sparse(Float64, I, n, n) b = Vector{Float64}(undef, n) @btime mul!($b, $A, x) setup=(x=ones($n)) @btime mul!($b, $A, x) setup=(x=view(ones($n, 1), :, 1)) return nothing end testmul(1000) ``` **Results** | | Julia 1.9 | Julia 1.10 | |---|---|---| | `x` vector | 2.135 μs (0 allocations) | 3.298 μs (0 allocations) | | `x` view | 2.502 μs (0 allocations) | 4.087 μs (1 allocation) |

mbauman · November 9, 2023, 6:49pm

As a quick fly-by shot-in-the-dark: 48 bytes smells like a type instability somewhere — both SparseMatrix and 2-index SubArrays have 5 8-byte fields (+ 8 byte type tag) == 48 bytes.

GunnarFarneback · November 10, 2023, 8:25am

If you go down a few steps in the call chain you have

julia> @btime SparseArrays.spdensemul!($u, 'N', 'N', $A, $v, LinearAlgebra.MulAddMul(true, false));
  15.212 ns (1 allocation: 48 bytes)

and then

julia> @btime SparseArrays._spmatmul!($u, $A, $v, true, false);
  9.602 ns (0 allocations: 0 bytes)

There are a few things that have been shaved off between those calls but it seems like the culprit is

julia> @btime LinearAlgebra.wrap($v, 'N');
  5.026 ns (1 allocation: 48 bytes)

That function is inherently type unstable but presumably that’s intended to be optimized away by constant propagation. I guess it didn’t quite succeed. In master it’s annotated with

Base.@constprop :aggressive function wrap(A::AbstractVecOrMat, tA::AbstractChar)

which looks like it’s addressing this problem.

Update: The annotation was introduced in Aggressive constprop in LinearAlgebra.wrap by jishnub · Pull Request #51582 · JuliaLang/julia · GitHub and seems to be included in 1.10.0-rc1, so maybe not effective enough.

jishnub · November 10, 2023, 12:04pm

I think ~~spdensemul!~~generic_matvecmul! also needs to be annotated with @constprop :aggressive. This may happen in Aggressive constprop in matvecmul and matmatmul by jishnub · Pull Request #51961 · JuliaLang/julia · GitHub

Good that I now have a motivating example to push that PR forward

Topic		Replies	Views
Memory Allocation when using mul! with sparse arrays and views Performance blas , linearalgebra , memory-allocation , sparsearrays	4	434	July 8, 2024
Allocation when applying in place multiplication to matrix of SVector Performance question	2	192	March 15, 2023
Passing views to function without allocation New to Julia	14	1534	December 18, 2020
Scalar multiplication makes array reallocation Performance array , memory-allocation , column-major	14	3530	October 15, 2020
Sparse arrays allocation versus speed Performance sparsearrays	9	225	May 13, 2025

Why does `mul!(u, A, v)` allocate when `A` is sparse and `u, v` are views?

Related topics