StaticArrays are not mutable, so this cannot work on them since .= lowers to broadcast! which mutates the values instead of returning them. Besides, SArrays shouldn’t be broadcasted anyways. You’ll notice it’s faster to let them utilize their unrolled fast arithmetic rather than using the anonymous function of broadcast. You need to code for these two cases very differently.
I disagree. There is no reason why you shouldn’t use broadcast and SArrays. The unrolled cases come into play for linear algebra operations, but there is no relation with elementwise operations . If broadcast is indeed slow for SArrays we need to fix it (and there is no apriori reason why it shouldn’t be fast, since you could unroll the entire broadcast operation).
The question whether or not .= should work for SArrays is debatable. Since the SArray is immutable asking for it to be overwritten makes little sense, but I would be in favour of making .= mean = for SArrays (and other immutable containers). Being able to use broadcast in generic code is important.
Right now the lowering of .= prevents SArray from defining that.
Sure, there’s no a priori reason why it shouldn’t be fast, but right now it isn’t so it should be avoided. If that gets changed then that would help a lot. I wasn’t talking about a hypothetical future, I was talking about writing code right now (this has a Usage tag and not a Development tag. I agree that the noted issues maybe could be solved in the future, but I don’t see a reason to believe it will change soon at all).
StaticArrays’s broadcast and broadcast! are fully unrolled (as in, the generated function creates an expression for each element of the destination without any runtime for-loops). +, -, etc. are all defined in terms of broadcast and map, e.g.
@inline +(a::Number, b::StaticArray) = broadcast(+, a, b)
function f(x,y,z)
x .+ y .+ z
end
function g(x,y,z)
x + y + z
end
using StaticArrays, BenchmarkTools
x = @SVector [2.0,3.0,4.0]
y = @SVector [2.0,3.0,4.0]
z = @SVector [2.0,3.0,4.0]
@btime f(x,y,z) # 33.372 ÎĽs (21 allocations: 768 bytes)
@btime g(x,y,z) # 19.085 ns (1 allocation: 32 bytes)
@btime f($x,$y,$z) # 3.220 ns (0 allocations: 0 bytes)
@btime g($x,$y,$z) # 3.220 ns (0 allocations: 0 bytes)
So I don’t know, but removing broadcast was a big improvement in the larger package for some reason . (But we can discuss this more in a separate thread)
Not sure if I’m a fan of that. I think it could be pretty confusing. Also, to support both mutable and non-mutable arrays, you’d have to do something like
function f(x::AbstractArray)
y = similar(x) # needed for Array, what should `similar` do for StaticArray?
y .= x.^2
y
end
similar(::StaticArray) could just call zeros, but I’d be worried that the compiler wouldn’t be able to optimize this away, especially if the broadcast!ed function is not inlined. So you’re going to have suboptimal performance for StaticArrays, and I think quite a few StaticArrays users want to squeeze every bit of performance out of the algorithm.
The results for non-interpolated x, y, and z can probably be explained by having to perform runtime dispatch for every element (with f) vs. only once (with g). So I can see how it can make a sizeable difference in type-unstable code.
If x, y, and zall have the same dimensions are all StaticArrays, + will actually call map (which is also specialized and fully unrolled in StaticArrays). So another difference could be that there’s no “loop fusion” (quotes because there is no actual loop) with x + y + z, while x .+ y .+ z is fused. Maybe that results in operations being performed in a different order, which could have some influence. Some of those differences might go away as the compiler performs optimization, but I guess there could be differences.
Edit: also, the fact that several unfused maps are called vs. one fused broadcast could certainly have been a big difference in light of inference limits, as in your https://github.com/JuliaLang/julia/issues/22255.