For small mutable arrays, my (perhaps incorrect) understanding was that MArrays from StaticArrays.jl would provide better performance than regular arrays. However, in the example I am working on (where I use an MArray for input and pre-allocated output), I am finding much worse performance. Consider the following code below.
using StaticArrays
using BenchmarkTools
@inline function LogisticCdf!(y::AbstractArray{T}, x::AbstractArray{T}, μ::T, σ::T) where{T <: Real}
@inbounds for iₓ in eachindex(x)
z = exp(-abs(x[iₓ] - μ) / σ)
y[iₓ] = (x[iₓ] ≥ μ ? one(T) : z) / (one(T) + z)
end
end
function test()
x = collect(LinRange(-4.0, 4.0, 50))
xm = MVector{50, Float64}(x)
μ = 2.0; σ = 2.2
y = similar(x)
ym = similar(xm)
@btime LogisticCdf!($y, $x, $μ, $σ)
@btime LogisticCdf!($ym, $xm, $μ, $σ)
end
test()
so I’m finding it twice as slow. This makes me think that I am either using MArrays incorrectly, or inappropriately. Any guidance would be appreciated. Thanks.
I don’t have a lot of time to dig into this right now, so my response will be limited.
I can’t explain why MArray is considerably slower in this case.
However, I wouldn’t generally expect MArray to be much/any faster than a normal Array except when functions can take specific advantage of the size being encoded in the type (for loop unrolling, primarily). For example, linear-algebraic operations using MArrays will write custom functions for the exact size and usually beat BLAS (used by Array) by a wide margin at small sizes. But your function wouldn’t allow this information to be used in a helpful way.
I could expect a performance benefit to using a SArray here instead, especially if you count the similar(_) preallocation for these other two. Making the array static usually puts it on the stack and that can carry some further benefits over even a MArray. But you’d need to write a different LogisticCdf function to support that - probably based on calling map over an SVector version of x. Something like
function LogisticCdf(x::AbstractArray{T}, μ::T, σ::T) where{T <: Real}
return map(x) do xi
z = exp(-abs(xi - μ) / σ)
return (xi ≥ μ ? one(T) : z) / (one(T) + z)
end
end
If you are ever in a situation where you can use SArray instead of MArray without excessive gymnastics, I’d recommend it.
EDIT: I realize later that I should have just written the scalar version of LogisticCdf and then broadcast it over the desired array. So more like