Vectorization when broadcasting vector of structs

Recently on slack I was discussing about the performance of generic structs and whether or not they should be vectorized and while exploring I found some interesting results.

using BenchmarkTools

struct TestType{𝒯}

f(t::TestType) = t.x + t.y
fbroadcast(vec) = f.(vec)

a = TestType(1.0,1.0,1.0)
floats = fill(a,1000)

c = TestType(tanh,1.0,1.0)
funcs = fill(c,1000)

temp = [a,c]
temp2 = fill(temp,500)
floatfunc = vcat(temp2...)

This is what I used to test and the results were

julia> @btime fbroadcast($floats);
  930.933 ns (1 allocation: 7.94 KiB)

julia> @btime fbroadcast($funcs);
  628.160 ns (1 allocation: 7.94 KiB)

julia> @btime fbroadcast($floatfunc);
  40.858 μs (2001 allocations: 39.19 KiB)

Why is the version with just functions stored in the structs significantly faster?
The mixed type one I get but I don’t understand the difference between storing a function and storing a float.
Checking the llvm codes, both have SIMD operations, however the version with just functions has some 8 x instructions like shufflevector <8 x double> %wide.vec28, <8 x double> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6> while the one with floats doesn’t.
My computer doesn’t have AVX-512 so having 8x instructions is weird to me.