# Working with `LinearAlgebra.mul!`

Reposting a reddit post here to solicit comments from the Julia community. I may have chanced upon a bug? According to reddit users `No-Distribution4263` and `Foura5` both `foo!` and `bar!` should result in an error. But the case below, `foo!` works while `bar!` does.

Also, n my use case `m` and `n` will be large thus `Z` will be a large array. What is the best way of mutating the values in `Z` without allocating another array?

Thanks very much.

``````using LinearAlgebra
using Random
using Statistics

Sigma = Symmetric(
[1.00 0.25 0.50;
0.25 1.00 0.15;
0.50 0.15 1.00]
)
mu = [1.50; 2.00; 2.50]
seed = 42
m = 10
n = 3

# foo!
C = cholesky(Sigma)
Z = randn(Xoshiro(seed), (m, n))

function foo!(F::Cholesky,
Z::Matrix{Float64},
mu::Vector{Float64})
Z .= mul!(Z, Z, F.U) .+ mu'
return Z
end

julia> foo!(C, Z, mu)
10×3 Matrix{Float64}:
1.13664    1.11003   1.09999
1.75174    2.87654   1.38161
1.18501    2.34136   3.47988
1.18875    1.5393    4.29595
2.31631    2.70479   2.74265
1.97674    3.52045   2.50674
0.640445   1.69499   2.03278
0.0307118  0.400502  1.82319
-0.614335   1.86276   0.617501
1.54378    1.91743   3.35531

# bar!
E = eigen(Sigma)
Z = randn(Xoshiro(seed), (m, n))

function bar!(F::Eigen,
Z::Matrix{Float64},
mu::Vector{Float64})
mul!(F.vectors, Diagonal(Base.sqrt.(F.values)), F.vectors')
Z .=  mul!(Z, Z, F.vectors) .+ mu'
return Z
end

julia> bar!(E, Z, mu)
ERROR: ArgumentError: output matrix must not be aliased with input matrix
Stacktrace:
@ LinearAlgebra ~/.julia/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:647
[2] mul!
@ ~/.julia/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:169 [inlined]
[3] mul!
@ ~/.julia/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:275 [inlined]
[4] bar!(F::Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}, Z::Matrix{Float64}, mu::Vector{Float64})
@ Main ./REPL[15]:5
[5] top-level scope
@ REPL[16]:1
``````

As I understand, the “no aliasing” is a BLAS limitation, so matrix multiplication operations that call BLAS internally will not work if two of the input arguments are aliased. For special types — such as `Triangular` and `Diagonal` — the matrix multiplication is implemented in Julia, so aliasing may not be an issue there. This isn’t a bug as such, rather it is undocumented behavior.

I think the aliasing is a fundamental issue, unrelated to BLAS, because you have to re-use the same elements from the first matrix over and over, but you have already modified them.

``````julia> using LinearAlgebra

julia> A = rand(5,5);

julia> B = Symmetric(rand(5,5));

julia> A * B
5×5 Matrix{Float64}:
1.72551   1.52777   1.30981  1.39166   1.4305
1.35803   1.30555   1.06533  1.11431   0.987554
0.547074  0.255427  0.51841  0.640794  0.668308
1.48027   1.03397   1.24259  0.880207  1.06837
1.69365   1.23249   1.3713   1.22077   1.3595

julia> mul!(A, A, B)
5×5 Matrix{Float64}:
0.683694   0.657493   0.499925   0.133962   1.31347
0.300644   0.289123   0.219835   0.0589079  0.57758
0.0795354  0.0764875  0.0581573  0.0155841  0.152799
0.312221   0.300256   0.2283     0.0611762  0.599821
0.40747    0.391855   0.297947   0.0798392  0.782807
``````

Fair enough. My point is that there are instances where it will work correctly, but the behavior in general should not be relied upon. The fact that it works in some cases (or at least doesn’t throw) isn’t a bug as such, as the behavior is undocumented. The docs specifically ask you to not do this.

I don’t understand. In what instances could it conceivably work?

Unless I’m making some big mistake, silently returning the wrong result is basically the worst kind of bug.

For example:

``````julia> D = Diagonal(rand(3))
3×3 Diagonal{Float64, Vector{Float64}}:
0.408947   ⋅         ⋅
⋅        0.365718   ⋅
⋅         ⋅        0.294737

julia> D * D
3×3 Diagonal{Float64, Vector{Float64}}:
0.167238   ⋅        ⋅
⋅        0.13375   ⋅
⋅         ⋅       0.0868701

julia> mul!(D, D, D)
3×3 Diagonal{Float64, Vector{Float64}}:
0.167238   ⋅        ⋅
⋅        0.13375   ⋅
⋅         ⋅       0.0868701
``````

On nightly

``````julia> U = UpperTriangular(rand(3,3))
3×3 UpperTriangular{Float64, Matrix{Float64}}:
0.19602  0.522613  0.790079
⋅       0.531755  0.349441
⋅        ⋅        0.638941

julia> D = Diagonal(rand(3))
3×3 Diagonal{Float64, Vector{Float64}}:
0.462221   ⋅         ⋅
⋅        0.687949   ⋅
⋅         ⋅        0.511165

julia> U * D
3×3 UpperTriangular{Float64, Matrix{Float64}}:
0.0906045  0.359531  0.403861
⋅         0.36582   0.178622
⋅          ⋅        0.326604

julia> mul!(U, U, D)
3×3 UpperTriangular{Float64, Matrix{Float64}}:
0.0906045  0.359531  0.403861
⋅         0.36582   0.178622
⋅          ⋅        0.326604
``````

In fact `rmul!` and `lmul!` for `Diagonal`s are defined using the aliased `mul!`, as it’s convenient.

OK, but I’m still missing how this is not ‘necessarily’ a bug:

``````julia> A = rand(3,3);

julia> B = Symmetric(rand(3,3));

julia> A * B
3×3 Matrix{Float64}:
0.453808  0.287673  0.510041
1.23402   0.715486  0.974913
0.738591  0.461621  1.21864

julia> mul!(A, A, B)
3×3 Matrix{Float64}:
0.207874  0.154886  0.312377
0.67797   0.505152  1.0188
0.166603  0.124135  0.250358
``````

I would say this is a terrible bug. What am I missing?

1 Like

I agree that examples such as this one, where we call BLAS internally, should always check for aliasing. This is certainly a bug.

1 Like

How is this related to BLAS? The examples you bring up where aliasing does not give wrong results happen because the structure of the two matrices just happen to be fortunate. If you moved the implementation of `mul!(::Matrix, ::Matrix, ::Symmetric)` into Julia itself, the result would still be wrong.

A julia implementation may allocate temporary arrays and use any algorithm necessary to produce correct results. Eg.

``````julia> function mulsym!(A, B::Symmetric)
temp = zeros(size(A, 2))
tempout = zero(temp)
for row in eachrow(A)
temp .= row
mul!(tempout, B, temp)
row .= tempout
end
return A
end

julia> A = rand(3,3);

julia> B = Symmetric(rand(3,3));

julia> A * B
3×3 Matrix{Float64}:
0.716792  0.407207  0.537676
0.468471  0.720006  0.70961
0.29495   0.582762  0.240157

julia> mulsym!(A, B)
3×3 Matrix{Float64}:
0.716792  0.407207  0.537676
0.468471  0.720006  0.70961
0.29495   0.582762  0.240157
``````

We don’t have that freedom with direct BLAS calls.

The examples you bring up where aliasing does not give wrong results happen because the structure of the two matrices just happen to be fortunate

I won’t say that it works because the structure is fortunate, rather the algorithms that have been used take advantage of the structure to produce correct results, even if the multiplication is inplace.

If you are allowed to allocate, then of course you can get the correct result. For example

``````mul!(A, B, C) = (A .= B * C)
``````

won’t have a problem with aliasing. My my point was that it’s not BLAS per se which is the problem, but algorithmic challenges.

In the general case, as well as in the specific case shown, in-place multiplication without allocation doesn’t work.

I would say this is a terrible bug. What am I missing?

I’m not certain why this is a bug, when the docstring states

`Note that Y must not be aliased with either A or B`

The result, if `Y` is aliased, is undocumented. It may error, or it may throw, or it may behave in unpredictable ways that might change between Julia versions. In general, I do agree that it’ll be better to throw in such cases where the correct result is not obtained. The docstring may also be reworded to something similar to that for `transpose!`:

`unexpected results will happen if src and dest have overlapping memory`

My my point was that it’s not BLAS per se which is the problem, but algorithmic challenges.

I agree, I think BLAS has been a distraction here, I was trying to make the point that using BLAS restricts us to their algorithms that assume that there’s no aliasing. This doesn’t imply that a julia implementation won’t have similar limitations if it doesn’t handle aliasing differently. The problem, as you rightly point out, is with algorithmic challenges.

If you are allowed to allocate, then of course you can get the correct result

Note that `mul!` isn’t always non-allocating, eg.

``````julia> A = rand(ComplexF64, 10, 10); B = similar(A, Float64); C = similar(A, Float64);

julia> @btime mul!(\$A, \$B, \$C);
1.560 μs (2 allocations: 10.12 KiB)
``````

I think the general idea is to keep allocations to a minimum, and to certainly not scale as `N^2` for an `NxN` matrix. The implementation that I had presented above is naive, and there might be smarter ways to avoid allocations.

``````A .= d .* A  # equivalent to Diagonal(d) * A