Why these simple codes have so much difference in performace?

struct TestStruct

get_field1(x::Vector{TestStruct}, y::S) = getfield.(x, y)
get_field2(x::Vector{TestStruct}, y::S) = [getfield(x_, y) for x_ in x]
get_field3(x::Vector{TestStruct}, y::Val{T}) where T = getfield.(x, T)
get_field4(x::Vector{TestStruct}, y::Val{T}) where T = [getfield(x_, T) for x_ in x]

test_struct_vec = [TestStruct(1, 1) for i = 1 : 1_000_000];

@time :get_fields1 get_field1(test_struct_vec, :a);
@time :get_fields2 get_field2(test_struct_vec, :a);
@time :get_fields3 get_field3(test_struct_vec, Val(:a));
@time :get_fields4 get_field4(test_struct_vec, Val(:a));

get_fields1: 0.015035 seconds (1.00 M allocations: 38.147 MiB)
get_fields2: 0.017935 seconds (1.00 M allocations: 38.147 MiB)
get_fields3: 0.030592 seconds (1.00 M allocations: 38.147 MiB)
get_fields4: 0.001574 seconds (2 allocations: 7.629 MiB)

As can be seen above, get_fields4 is much faster than the others. What is the course of this difference?


For the first two to be fast, the symbol needs to be constant propagated so the compiler compiles a special variant where sym == :a, just like you force it when you explicitly use Val. But this doesn’t happen if the symbol is not constant when it is used, which is the case because you time a top level expression. If you wrap the call with a normal symbol into an outer function, then inside that function the compiler will know the symbol to be constant, which should make constant propagation happen and match the speed of variant 4.

Why 3 is slower I’m not immediately sure, I’m just used to broadcasting having the occasional problem with type inference or constant propagation.