I don’t know of a way to do this with I, but a loop works and is efficient (faster than your f1! on my machine):
function f3!(A,s)
m = min(size(A)...)
for i = 1:m; A[i,i] += s; end
return A
end
f1! calls the broadcast machinery under the hood, not BLAS. It’s possible but a bit tricky to use a BLAS axpy function for this operation:
function f4!(A::Matrix{T}, s) where {T} m = min(size(A)...)
m = min(size(A)...)
incA = stride(A,1) + stride(A,2)
sr = Ref{T}(s)
GC.@preserve sr LinearAlgebra.BLAS.axpy!(m, one(T), Base.unsafe_convert(Ptr{T}, sr), 0, A, incA)
return A
end
but on my machine it’s only 5% faster than my looping implementation f3! for your 1000×1000 benchmark. And even this small performance difference goes away if I use @inbounds in my f3! loop.
Not as far as I know. (There’s little or no benefit to BLAS over a simple for loop for axpy anyway, especially for vector .+ scalar; compilers are good at axpy-like loops and there’s no possibility of fancy blocking hand optimizations like there is for BLAS-3 / matrix multiply / gemm, while loops are far more versatile in supporting more types etcetera.)
(As far as I know, BLAS axpy is not even used for vector + vector in Julia. There’s no point. Besides, if you care about performance you’re much better off combining multiple vector operations into a single loop, or using “dot fusion”, than breaking your calculation up into a sequence of axpy and other elementary operations.)
In general, performance optimization in Julia doesn’t rely on “mining” the standard library in the hope of finding a “vectorized / built-in” function that does exactly what you want (unlike e.g. Matlab or Python). Properly written user code and loops are fast.
Wouldn’t an extension of mul! be a natural place to put this functionality?
import LinearAlgebra.mul!
function mul!(X::StridedMatrix{T},a::Bool,B::UniformScaling{T},alpha::Bool,beta::Bool) where {T}
if (a & alpha & beta)
D = view(X, diagind(X)) # Or more efficient version
D .+= B.λ
else
easytoimplement()
end
return X
end
function f6!(A::StridedMatrix,s)
mul!(A,true,s*I,true,true);
end
Okay. I will add a PR eventually. It doesn’t really involve a multiplication, so mul! is not an obvious for users who don’t know how five-argument mul! can be used.
I agree. axpby! does have a heritage from blas (even the manual is referring to BLAS.axpby!) and this feature is far from blas. mul! is julia specific. I wonder if axpby! is really meant to be used as add! analogous to mul!.