Slowdown with StaticArrays and broadcasting

I’ve been trying to use the master branch of StaticArrays and Julia 1.0.1. My code works fine with the normal Arrays, but when I use SArrays, there’s a noticeable slowdown.

Are there still some open issues with StaticArrays on 1.0? Particularly regarding broadcasting?

Here’s my attempt at a minimal example.

using StaticArrays
using BenchmarkTools

function foo(x, dt, f1, f2, f3, f4, f5, f6, f7)
    b1 = 0.09646076681806523
    b2 = 0.01
    b3 = 0.4798896504144996
    b4 = 1.379008574103742
    b5 = -3.290069515436081
    b6 = 2.324710524099774
    b7 = 0

    xn = @. x + dt * (b1*f1 + b2*f2 + b3*f3 + b4*f4 + b5*f5 + b6*f6 + b7*f7)

    xn
end

x = @SVector(rand(3))
f1 = @SVector(rand(3))
f2 = @SVector(rand(3))
f3 = @SVector(rand(3))
f4 = @SVector(rand(3))
f5 = @SVector(rand(3))
f6 = @SVector(rand(3))
f7 = @SVector(rand(3))
@btime foo($x, 1e-5, $f1, $f2, $f3, $f4, $f5, $f6, $f7)

Without broadcasting (removing the @.), I get:

  15.837 ns (0 allocations: 0 bytes)

With broadcasting, I get:

  14.477 μs (427 allocations: 21.33 KiB)

It seems some inference limit is getting reached,

   xn = x .+ dt * (b1*f1 .+ b2*f2 .+ b3*f3 .+ b4*f4)

works ok but

 xn = x .+ dt * (b1*f1 .+ b2*f2 .+ b3*f3 .+ b4*f4 .+ b5*f5)

allocates

Is there anything I can do about it? Or is there something StaticArrays can do about it?

Just remove the @.?

Sure, but I was hoping that this particular bit of code could be used for any AbstractVector… So in the general case I think I need to use broadcasting.

In addition to the difference in run time, there also seems to be something going on with StaticArrays and broadcasting causing large compilation times. It is somewhat noticeable with the example above. The function that I really notice it with has about 10 broadcasted equations of the size above.