# Uniform scaling inplace addition with matrix

I want to do an inplace add of a multiple of the identity to a matrix. I can get the efficiency I want with a view of the diagonal:

``````using LinearAlgebra
function f1!(A,s)
D = view(A, diagind(A, 0))
D .+= s;
end
``````
``````julia> s=1e-5;
julia> A=randn(1000,1000); x=diag(A);
julia> f1!(A,s);
julia> norm(diag(A) - (x.+s))
0.0
julia> @btime f1!(A,s)
4.627 μs (3 allocations: 144 bytes)
``````

For various reasons, I would prefer to use the `UniformScaling`. Is it possible to achieve the same efficiency with `UniformScaling`?

For reference:

``````function f2!(A,s)
A[:,:] += s*I;
end
``````
``````julia> @btime f2!(A,s);
5.750 ms (8 allocations: 15.26 MiB)
``````

As far as I understand, there are two problems with `f2!`:

• Extra memory allocation (associated parse time conversion of `+=`)
• The `+(A::Matrix,J::UniformScaling)` is a for-loop referenced below, rather than a blas level-1 `axpy`-call which is what we get in `f1!`.

I don’t know of a way to do this with `I`, but a loop works and is efficient (faster than your `f1!` on my machine):

``````function f3!(A,s)
m = min(size(A)...)
for i = 1:m; A[i,i] += s; end
return A
end
``````

`f1!` calls the `broadcast` machinery under the hood, not BLAS. It’s possible but a bit tricky to use a BLAS `axpy` function for this operation:

``````function f4!(A::Matrix{T}, s) where {T}    m = min(size(A)...)
m = min(size(A)...)
incA = stride(A,1) + stride(A,2)
sr = Ref{T}(s)
GC.@preserve sr LinearAlgebra.BLAS.axpy!(m, one(T), Base.unsafe_convert(Ptr{T}, sr), 0, A, incA)
return A
end
``````

but on my machine it’s only 5% faster than my looping implementation `f3!` for your 1000×1000 benchmark. And even this small performance difference goes away if I use `@inbounds` in my `f3!` loop.

3 Likes

Thanks! Learning a lot. I thought `f1!` would be effectively be the same as `f4!`. So, the BLAS `incx=0`-trick for `vec .+ scalar` is never used in julia?

Not as far as I know. (There’s little or no benefit to BLAS over a simple `for` loop for `axpy` anyway, especially for `vector .+ scalar`; compilers are good at `axpy`-like loops and there’s no possibility of fancy blocking hand optimizations like there is for BLAS-3 / matrix multiply / `gemm`, while loops are far more versatile in supporting more types etcetera.)

(As far as I know, BLAS `axpy` is not even used for `vector + vector` in Julia. There’s no point. Besides, if you care about performance you’re much better off combining multiple vector operations into a single loop, or using “dot fusion”, than breaking your calculation up into a sequence of `axpy` and other elementary operations.)

In general, performance optimization in Julia doesn’t rely on “mining” the standard library in the hope of finding a “vectorized / built-in” function that does exactly what you want (unlike e.g. Matlab or Python). Properly written user code and loops are fast.

4 Likes

Great. Thanks. That’s helpful in other places in my code.

I’ll leave this post open for a while since an efficient version involving `I` would be helpful.

You can use `diagind` in order to do this inplace on the diagonal. It’s how we do it in OrdinaryDiffEq to make it GPU-compatible:

4 Likes

Thanks. Does that compile to the same as the use of `diagind` in `f1!` ?

Edit: Ah. Now I understand your point. It does make it “work” for `I`.

Wouldn’t an extension of `mul!` be a natural place to put this functionality?

``````import LinearAlgebra.mul!
function mul!(X::StridedMatrix{T},a::Bool,B::UniformScaling{T},alpha::Bool,beta::Bool) where {T}
if (a  & alpha & beta)
D = view(X, diagind(X)) # Or more efficient version
D .+= B.λ
else
easytoimplement()
end
return X
end
function f6!(A::StridedMatrix,s)
mul!(A,true,s*I,true,true);
end
``````
``````julia> A=randn(1000,1000);
julia> @btime f6!(A,3.0);
3.732 μs (2 allocations: 80 bytes)
``````

That’s probably the right way to add it.

Okay. I will add a PR eventually. It doesn’t really involve a multiplication, so `mul!` is not an obvious for users who don’t know how five-argument `mul!` can be used.

Yes, `mul!()` is a strange function for an addition method. I think I would look for this functionality in `axpby!()`:

`axpby!()` in manual:

I agree. `axpby!` does have a heritage from blas (even the manual is referring to `BLAS.axpby!`) and this feature is far from blas. `mul!` is julia specific. I wonder if `axpby!` is really meant to be used as `add!` analogous to `mul!`.

1 Like

The elegant solution would be for this to work:

``````A .+= (s*I)
``````

But it doesn’t.

There is something blocking this, but I’m not sure what: Broadcasting UniformScaling Operations · Issue #23197 · JuliaLang/julia · GitHub

1 Like