I want to multiply each slice of an array by a different number. I want to do it for arbitrary array and slice dimensions, inplace and efficient if possible. How to do this? Here is an example:
Multipy each row of a matrix by a different number:
function multiply_slices!(A, v; dims)
# only one special case implemented
@assert dims == 2
@assert ndims(A) == 2
@assert ndims(v) == 1
@assert axes(A, 1) == axes(v,1)
@inbounds @simd for j in axes(A, 2)
for i in axes(A, 1)
A[i, j] *= v[i]
end
end
A
end
nrows = 2
ncols = 3
A = collect(reshape(1:(nrows*ncols), (nrows, ncols)))
v = 1:nrows
multiply_slices!(A, v)
using LazyArrays: @~
function multiply_slices!(A, v; dims)
select1 = ntuple(i -> i in dims ? axes(A, i) : Ref(:), ndims(A))
select2 = ntuple(i -> i in dims ? axes(v, i) : Ref(:), ndims(v))
RefA = Ref(A)
Refv = Ref(v)
slices = @~ tuple.(view.(RefA, select1...), view.(Refv, select2...))
for (x, y) in slices
x .*= y
end
return A
end
Thanks! This does look good. It is not as fast as the hand coded variant, but fast enough for my use case. It can be made a little faster by adding a function barrier.
using LazyArrays: @~
function multiply_slices_hand!(A, v; dims)
# only one special case implemented
@assert dims == 2
@assert ndims(A) == 2
@assert ndims(v) == 1
@assert axes(A, 1) == axes(v,1)
@inbounds @simd for j in axes(A, 2)
for i in axes(A, 1)
A[i, j] *= v[i]
end
end
A
end
function multiply_slices_lazy!(A, v; dims)
select1 = ntuple(i -> i in dims ? axes(A, i) : Ref(:), ndims(A))
select2 = ntuple(i -> i in dims ? axes(v, i) : Ref(:), ndims(v))
RefA = Ref(A)
Refv = Ref(v)
# @show select1
# @show select2
slices = @~ tuple.(view.(RefA, select1...), view.(Refv, select2...))
for (x, y) in slices
x .*= y
end
return A
end
function multiply_slices_barrier!(A, v; dims)
select1 = ntuple(i -> i in dims ? axes(A, i) : Ref(:), ndims(A))
select2 = ntuple(i -> i in dims ? axes(v, i) : Ref(:), ndims(v))
RefA = Ref(A)
Refv = Ref(v)
# @show select1
# @show select2
slices = @~ tuple.(view.(RefA, select1...), view.(Refv, select2...))
kernel!(slices)
return A
end
@noinline function kernel!(slices)
for (x, y) in slices
x .*= y
end
end
using BenchmarkTools
nrows = 100
ncols = 2000
A = randn(nrows, ncols)
v = randn(nrows)
@btime multiply_slices_hand!(A, v, dims=2)
@btime multiply_slices_lazy!(A, v, dims=2)
@btime multiply_slices_barrier!(A, v, dims=2)
1.423 ms (1 allocation: 16 bytes)
3.376 ms (18013 allocations: 594.00 KiB)
2.095 ms (4013 allocations: 187.75 KiB)
Thanks! I can see that my above timings are far too long and that your timings are much more reasonable. But I don’t understand what slowness from underflow/overflow is. Can you explain?
My guess was that the slowness was due to denormals (Wikipedia) since you are keep multiplying by some numbers (and many of them are smaller than one).
using BenchmarkTools
n = 2^15
@btime x .*= 0.5 setup = (x = ones(n))
@btime x .*= 0.5 setup = (x = nextfloat.(zeros(n)))