I’m interested in broadcasting an arbitrary function over the columns of a matrix and having the output be a matrix. The output is guaranteed to be the same size as the input if that helps implementation.
using BenchmarkTools
a = rand(3,3)
foo(x) = reverse(x)
@btime mapslices(foo, a, dims=1)
2.046 μs (50 allocations: 2.27 KiB)
However the time to actually perform the operation is quite small so there seems to be a lot of room for improvement.
Thanks for the explanation and preview of coming improvements.
Digging in a little further, I think I’m most interested in what is driving the performance in the SVector case. If I profile the code, I see that the allocation of the SizedArray through the collect call in map is taking the vast majority of the time vs the actual execution of foo.
Do folks know what is going on here and why there is an allocation in this case?
Just a quick update on this in case anyone runs into the same thing. To get this working well with StaticArrays I had to define a specific mapcolumns function that tells the compiler what the output type of any function foo is (as well as use the hidden sacollect method).
function mapcolumns(OutType::Type{<:SVector}, fn::Function, m::SMatrix{N,M,T}) where {N,M,T}
return reduce(hcat, map((x -> fn(x)::OutType), StaticArrays.sacollect(SVector{M}, eachcol(m))))
end
# default assumes the function outputs the same type as input
mapcolumns(fn::Function, m::SMatrix{N,M,T}) where {N,M,T} = mapcolumns(SVector{N,T}, fn, m)
using BenchmarkTools, StaticArrays
foo(x) = reverse(x)
a = rand(SMatrix{3,30})