Working with `LinearAlgebra.mul!`

Reposting a reddit post here to solicit comments from the Julia community. I may have chanced upon a bug? According to reddit users No-Distribution4263 and Foura5 both foo! and bar! should result in an error. But the case below, foo! works while bar! does.

Also, n my use case m and n will be large thus Z will be a large array. What is the best way of mutating the values in Z without allocating another array?

Thanks very much.

using LinearAlgebra
using Random
using Statistics

Sigma = Symmetric(
  [1.00 0.25 0.50;
   0.25 1.00 0.15;
   0.50 0.15 1.00]
)
mu = [1.50; 2.00; 2.50]
seed = 42
m = 10
n = 3

# foo!
C = cholesky(Sigma)
Z = randn(Xoshiro(seed), (m, n))

function foo!(F::Cholesky,
              Z::Matrix{Float64},
              mu::Vector{Float64})
    Z .= mul!(Z, Z, F.U) .+ mu'
    return Z
end

julia> foo!(C, Z, mu)
10×3 Matrix{Float64}:
  1.13664    1.11003   1.09999
  1.75174    2.87654   1.38161
  1.18501    2.34136   3.47988
  1.18875    1.5393    4.29595
  2.31631    2.70479   2.74265
  1.97674    3.52045   2.50674
  0.640445   1.69499   2.03278
  0.0307118  0.400502  1.82319
 -0.614335   1.86276   0.617501
  1.54378    1.91743   3.35531

# bar!
E = eigen(Sigma)
Z = randn(Xoshiro(seed), (m, n))

function bar!(F::Eigen,
              Z::Matrix{Float64},
              mu::Vector{Float64})
    mul!(F.vectors, Diagonal(Base.sqrt.(F.values)), F.vectors')
    Z .=  mul!(Z, Z, F.vectors) .+ mu'
    return Z
end

julia> bar!(E, Z, mu)
ERROR: ArgumentError: output matrix must not be aliased with input matrix
Stacktrace:
 [1] gemm_wrapper!(C::Matrix{Float64}, tA::Char, tB::Char, A::Matrix{Float64}, B::Matrix{Float64}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra ~/.julia/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:647
 [2] mul!
   @ ~/.julia/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:169 [inlined]
 [3] mul!
   @ ~/.julia/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:275 [inlined]
 [4] bar!(F::Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}, Z::Matrix{Float64}, mu::Vector{Float64})
   @ Main ./REPL[15]:5
 [5] top-level scope
   @ REPL[16]:1

As I understand, the “no aliasing” is a BLAS limitation, so matrix multiplication operations that call BLAS internally will not work if two of the input arguments are aliased. For special types — such as Triangular and Diagonal — the matrix multiplication is implemented in Julia, so aliasing may not be an issue there. This isn’t a bug as such, rather it is undocumented behavior.

I think the aliasing is a fundamental issue, unrelated to BLAS, because you have to re-use the same elements from the first matrix over and over, but you have already modified them.

Also, you get the wrong answer, which is bad:

julia> using LinearAlgebra

julia> A = rand(5,5);

julia> B = Symmetric(rand(5,5));

julia> A * B
5×5 Matrix{Float64}:
 1.72551   1.52777   1.30981  1.39166   1.4305
 1.35803   1.30555   1.06533  1.11431   0.987554
 0.547074  0.255427  0.51841  0.640794  0.668308
 1.48027   1.03397   1.24259  0.880207  1.06837
 1.69365   1.23249   1.3713   1.22077   1.3595

julia> mul!(A, A, B)
5×5 Matrix{Float64}:
 0.683694   0.657493   0.499925   0.133962   1.31347
 0.300644   0.289123   0.219835   0.0589079  0.57758
 0.0795354  0.0764875  0.0581573  0.0155841  0.152799
 0.312221   0.300256   0.2283     0.0611762  0.599821
 0.40747    0.391855   0.297947   0.0798392  0.782807

Github issue here: Aliased output matrix in LinearAlgebra.mul! does not throw error for matrix wrapper types · Issue #45179 · JuliaLang/julia · GitHub

Fair enough. My point is that there are instances where it will work correctly, but the behavior in general should not be relied upon. The fact that it works in some cases (or at least doesn’t throw) isn’t a bug as such, as the behavior is undocumented. The docs specifically ask you to not do this.

I don’t understand. In what instances could it conceivably work?

Unless I’m making some big mistake, silently returning the wrong result is basically the worst kind of bug.

For example:

julia> D = Diagonal(rand(3))
3×3 Diagonal{Float64, Vector{Float64}}:
 0.408947   ⋅         ⋅ 
  ⋅        0.365718   ⋅ 
  ⋅         ⋅        0.294737

julia> D * D
3×3 Diagonal{Float64, Vector{Float64}}:
 0.167238   ⋅        ⋅ 
  ⋅        0.13375   ⋅ 
  ⋅         ⋅       0.0868701

julia> mul!(D, D, D)
3×3 Diagonal{Float64, Vector{Float64}}:
 0.167238   ⋅        ⋅ 
  ⋅        0.13375   ⋅ 
  ⋅         ⋅       0.0868701

On nightly

julia> U = UpperTriangular(rand(3,3))
3×3 UpperTriangular{Float64, Matrix{Float64}}:
 0.19602  0.522613  0.790079
  ⋅       0.531755  0.349441
  ⋅        ⋅        0.638941

julia> D = Diagonal(rand(3))
3×3 Diagonal{Float64, Vector{Float64}}:
 0.462221   ⋅         ⋅ 
  ⋅        0.687949   ⋅ 
  ⋅         ⋅        0.511165

julia> U * D
3×3 UpperTriangular{Float64, Matrix{Float64}}:
 0.0906045  0.359531  0.403861
  ⋅         0.36582   0.178622
  ⋅          ⋅        0.326604

julia> mul!(U, U, D)
3×3 UpperTriangular{Float64, Matrix{Float64}}:
 0.0906045  0.359531  0.403861
  ⋅         0.36582   0.178622
  ⋅          ⋅        0.326604

In fact rmul! and lmul! for Diagonals are defined using the aliased mul!, as it’s convenient.

OK, but I’m still missing how this is not ‘necessarily’ a bug:

julia> A = rand(3,3);

julia> B = Symmetric(rand(3,3));

julia> A * B
3×3 Matrix{Float64}:
 0.453808  0.287673  0.510041
 1.23402   0.715486  0.974913
 0.738591  0.461621  1.21864

julia> mul!(A, A, B)
3×3 Matrix{Float64}:
 0.207874  0.154886  0.312377
 0.67797   0.505152  1.0188
 0.166603  0.124135  0.250358

I would say this is a terrible bug. What am I missing?

1 Like

I agree that examples such as this one, where we call BLAS internally, should always check for aliasing. This is certainly a bug.

1 Like

How is this related to BLAS? The examples you bring up where aliasing does not give wrong results happen because the structure of the two matrices just happen to be fortunate. If you moved the implementation of mul!(::Matrix, ::Matrix, ::Symmetric) into Julia itself, the result would still be wrong.

A julia implementation may allocate temporary arrays and use any algorithm necessary to produce correct results. Eg.

julia> function mulsym!(A, B::Symmetric)
           temp = zeros(size(A, 2))
           tempout = zero(temp)
           for row in eachrow(A)
               temp .= row
               mul!(tempout, B, temp)
               row .= tempout
           end
           return A
       end


julia> A = rand(3,3);

julia> B = Symmetric(rand(3,3));

julia> A * B
3×3 Matrix{Float64}:
 0.716792  0.407207  0.537676
 0.468471  0.720006  0.70961
 0.29495   0.582762  0.240157

julia> mulsym!(A, B)
3×3 Matrix{Float64}:
 0.716792  0.407207  0.537676
 0.468471  0.720006  0.70961
 0.29495   0.582762  0.240157

We don’t have that freedom with direct BLAS calls.

The examples you bring up where aliasing does not give wrong results happen because the structure of the two matrices just happen to be fortunate

I won’t say that it works because the structure is fortunate, rather the algorithms that have been used take advantage of the structure to produce correct results, even if the multiplication is inplace.

If you are allowed to allocate, then of course you can get the correct result. For example

mul!(A, B, C) = (A .= B * C) 

won’t have a problem with aliasing. My my point was that it’s not BLAS per se which is the problem, but algorithmic challenges.

In the general case, as well as in the specific case shown, in-place multiplication without allocation doesn’t work.

I would say this is a terrible bug. What am I missing?

I’m not certain why this is a bug, when the docstring states

Note that Y must not be aliased with either A or B

The result, if Y is aliased, is undocumented. It may error, or it may throw, or it may behave in unpredictable ways that might change between Julia versions. In general, I do agree that it’ll be better to throw in such cases where the correct result is not obtained. The docstring may also be reworded to something similar to that for transpose!:

unexpected results will happen if src and dest have overlapping memory

My my point was that it’s not BLAS per se which is the problem, but algorithmic challenges.

I agree, I think BLAS has been a distraction here, I was trying to make the point that using BLAS restricts us to their algorithms that assume that there’s no aliasing. This doesn’t imply that a julia implementation won’t have similar limitations if it doesn’t handle aliasing differently. The problem, as you rightly point out, is with algorithmic challenges.

If you are allowed to allocate, then of course you can get the correct result

Note that mul! isn’t always non-allocating, eg.

julia> A = rand(ComplexF64, 10, 10); B = similar(A, Float64); C = similar(A, Float64);

julia> @btime mul!($A, $B, $C);
  1.560 μs (2 allocations: 10.12 KiB)

I think the general idea is to keep allocations to a minimum, and to certainly not scale as N^2 for an NxN matrix. The implementation that I had presented above is naive, and there might be smarter ways to avoid allocations.

You can already multiply by a diagonal matrix in-place using broadcasting:

A .= d .* A  # equivalent to Diagonal(d) * A
A .= A .* transpose(d) # equivalent to A * Diagonal(d)