Vectorization when broadcasting vector of structs

gbaraldi · March 17, 2021, 8:02pm

Recently on slack I was discussing about the performance of generic structs and whether or not they should be vectorized and while exploring I found some interesting results.

using BenchmarkTools

struct TestType{𝒯}
    z::𝒯
    x::Float64
    y::Float64
end

f(t::TestType) = t.x + t.y
fbroadcast(vec) = f.(vec)

a = TestType(1.0,1.0,1.0)
floats = fill(a,1000)

c = TestType(tanh,1.0,1.0)
funcs = fill(c,1000)

temp = [a,c]
temp2 = fill(temp,500)
floatfunc = vcat(temp2...)

This is what I used to test and the results were

julia> @btime fbroadcast($floats);
  930.933 ns (1 allocation: 7.94 KiB)

julia> @btime fbroadcast($funcs);
  628.160 ns (1 allocation: 7.94 KiB)

julia> @btime fbroadcast($floatfunc);
  40.858 μs (2001 allocations: 39.19 KiB)

Why is the version with just functions stored in the structs significantly faster?
The mixed type one I get but I don’t understand the difference between storing a function and storing a float.
Checking the llvm codes, both have SIMD operations, however the version with just functions has some 8 x instructions like shufflevector <8 x double> %wide.vec28, <8 x double> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6> while the one with floats doesn’t.
My computer doesn’t have AVX-512 so having 8x instructions is weird to me.

Topic		Replies	Views
Normal vs broadcasted slice assignment General Usage	5	263	February 16, 2024
Defining function as scalar vs fusing with two Ref's, significant speed difference New to Julia broadcast	1	461	September 16, 2021
Float64 comparison operator performance Performance	8	1061	September 26, 2019
Broadcast vs. scalar loop, can Julia vectorize better? Internals & Design	8	1914	February 15, 2020
When should I write loops or vectorised calls? General Usage	17	1776	December 1, 2020

Vectorization when broadcasting vector of structs

Related topics