Why is using broadcasting with a constructor slower outside of function?


I’ve noticed that using broadcasting takes longer than using broadcasting in a script is slower if not done in the function. This seems to be mostly compilation time, and running the same thing again it runs faster but is there a way to avoid this?

In the code below I have a struct Foo and can construct a vector of them by calling Foo.(inputs), or call the same but within function process().

The latter is faster, even though it gets called first. And I don’t understand why Foo.(inputs) takes so long to compile and run the first time it’s called if it was already used 3 lines above in a function?

#Generate some inputs
letter = ["A", "B"];
inputs = String[];
for ii = 1 : 10000
    push!(inputs, "$(letter[mod(ii-1, 2)+1]) $((mod(ii, 2) == 0 ? -1 : 1) * mod(div(ii, 2), 10))")

struct Foo
    function Foo(str::String)::Foo
        if startswith(str, "A")
            new(0, 0);
            new(1, parse(Int, str[3:end]));

process(inputs::Vector{String})::Vector{Foo} = Foo.(inputs);

@time foos1 = process(inputs);

print("Foo.(inputs):   ")
@time foos2 = Foo.(inputs);

print("Foo.(inputs) 2: ")
@time foos3 = Foo.(inputs);


process(inputs):  0.000336 seconds (5.00 k allocations: 273.484 KiB)
Foo.(inputs):     0.015073 seconds (5.67 k allocations: 304.172 KiB, 97.58% compilation time)
Foo.(inputs) 2:   0.000314 seconds (5.00 k allocations: 273.531 KiB)


This is rather interesting, and not what I would have expected. It appears to be a combination of inlining and type annotation, because either of

process(inputs::Vector{String})::Vector{Foo} = @noinline Foo.(inputs);
process(inputs) = Foo.(inputs);

results in (Julia 1.8.3) similar compilation overhead than calling Foo.(inputs) directly - as I would have expected anyway.