@printf strange performance behavior

Hello everybody!

I recently encountered some strange behavior related to printing numbers as ASCII in Julia.
Tested were both Julia 1.6 and 1.7, the following timings are from 1.6.

using Printf
using BenchmarkTools

function combined_print(arr)
    buf = IOBuffer()
    print(buf, arr[1], " ", arr[2], " ", arr[3], " ", arr[3], " ", arr[5], " ", arr[6], " ", arr[7], " ", arr[8], " ", arr[9], " ", arr[10], "\n")
end

function separated_print(arr)
    buf = IOBuffer()
    print(buf,arr[1], " ")
    print(buf,arr[2], " ")
    print(buf,arr[3], " ")
    print(buf,arr[4], " ")
    print(buf,arr[5], " ")
    print(buf,arr[6], " ")
    print(buf,arr[7], " ")
    print(buf,arr[8], " ")
    print(buf,arr[9], " ")
    print(buf, arr[10], "\n")
end

function combined_printf(arr)
    buf = IOBuffer()
    @printf buf "%17.16lf %17.16lf %17.16lf %17.16lf %17.16lf %17.16lf %17.16lf %17.16lf %17.16lf %17.16lf\n" arr[1] arr[2] arr[3] arr[3] arr[5] arr[6] arr[7] arr[8] arr[9] arr[10]
end

function separated_printf(arr)
    buf = IOBuffer()
    @printf buf "%17.16lf " arr[1]
    @printf buf "%17.16lf " arr[2]
    @printf buf "%17.16lf " arr[3]
    @printf buf "%17.16lf " arr[4]
    @printf buf "%17.16lf " arr[5]
    @printf buf "%17.16lf " arr[6]
    @printf buf "%17.16lf " arr[7]
    @printf buf "%17.16lf " arr[8]
    @printf buf "%17.16lf " arr[9]
    @printf buf "%17.16lf\n" arr[10]
end

@btime combined_print(a) setup=(a=rand(10))
  1.913 μs (36 allocations: 5.09 KiB)

@btime separated_print(a) setup=(a=rand(10))
  1.994 μs (36 allocations: 5.25 KiB)

@btime combined_printf(a) setup=(a=rand(10))
  1.958 μs (29 allocations: 4.91 KiB)

@btime separated_printf(a) setup=(a=rand(10))
  1.688 μs (26 allocations: 4.94 KiB)

So to summarize, @printf is generally faster than print, which does surprise me, but the part I really don’t understand is, why printing only a single number per @printf statement as in separated_printf is faster than doing it all in a single @printf statement as in combined_printf.

Thanks for any ideas, as to what happens here.

EDIT: I corrected a mistake, in my benchmark, as the functions using print did print all 17 decimal places, whereas @printf "%lf" did not. This eliminates the general difference between @printf and print, but it remains, that separated_printf is faster than combined_printf and *_print.

If I had to guess you are seeing some sort of overhead from the vararg function that isn’t there on the non vararg one.