Broadcasting Heisen-allocations

I have a weird code snippet that was reduced from a real codebase that was mysteriously allocating where it shouldn’t, for all I could tell:


using BenchmarkTools
using StaticArrays

test(z::SVector{N}) where {N} = problematic(test(Val(N)), first(z))

test(::Val{N}) where {N} = SVector(ntuple(identity, Val(N)))

function problematic(d::SVector{N}, α) where {N}
    αp = ntuple(Returns(α), Val(N))
    d´ = d .* SVector(αp)
    return d´
end

v = SA[1.0, 0, 0, 0]; vs = SVector(v,v,v,v);
@btime test.($vs);

The result on my computer (Apple Silicon), all the way from Julia v1.7 to master, is the following:

julia> @btime test.($vs);
  342.905 ns (5 allocations: 720 bytes)

If I run the whole snippet again, I get the same 5 allocations and slow runtime. However, if I now re-run only from function problematic... till the end (without changing anything) I get

julia> function problematic(d::SVector{N}, α) where {N}
           αp = ntuple(Returns(α), Val(N))
           d´ = d .* SVector(αp)
           return d´
       end
problematic (generic function with 1 method)

julia> v = SA[1.0, 0, 0, 0]; vs = SVector(v,v,v,v);

julia> @btime test.($vs);
  2.041 ns (0 allocations: 0 bytes)

So merely repeating the definition of problematic appears to free something inside the guts of the compiler, that is then able to digest through the code without introducing allocations. At this point, if I rerun everything again, I remain fast and allocation-free.

[By the way, the actual, problematic line seems to be d´ = d .* SVector(αp) in the problematic function. If I do d´ = SVector(Tuple(d) .* αp) instead, things remain allocation free from the beginning. EDIT: which might mean that the bug is ultimately originating in StaticArrays]

This has driven me crazy for quite a bit, because with Revise, sometimes touching a function in some absolutely trivial way that should not change its behavior was suddenly changing my allocations and performance. I suspect there is some kind of compiler bug hiding here? Any clue as to why this is happening?

A further reduced MWE

using BenchmarkTools
using StaticArrays

test(f::SVector) = f .* f'

v = SA[1, 0]; vs = SVector(v,v);
@btime test.($vs);
julia> @btime test.($vs);
  288.321 ns (8 allocations: 416 bytes)

filed: Allocations in broadcast of broadcast · Issue #1178 · JuliaArrays/StaticArrays.jl · GitHub