Huge difference between passing a type, or using a hardcoded-type (in benchmarks)?

I do not understand why there is a 30x difference between these two implementations. I guess I am doing some dumb (benchmarking) mistake because the produced LLVM code seems to be equal.

Note that I am using a large buffer to avoid EOF inside the benchmark loop. I could not get it to work with @btime foo(x) setup=(x=IOBuffer(rand(UInt8, 10000))) as it always throws an EOF error, no matter how large the buffer is.

julia> function foo(io)
           out = Vector{Int32}()
           sizehint!(out, 100)
           for i in 1:100
               push!(out, read(io, Int32))
           end
           out
       end
foo (generic function with 2 methods)

julia> function foo(io, T::Type)
           out = Vector{T}()
           sizehint!(out, 100)
           for i in 1:100
               push!(out, read(io, T))
           end
           out
       end
foo (generic function with 2 methods)

julia> data = IOBuffer(rand(UInt8, 1000000000));

julia> @btime foo($data);
  848.355 ns (2 allocations: 528 bytes)

julia> data = IOBuffer(rand(UInt8, 1000000000));

julia> @btime foo($data, $Int32);
  24.893 Îźs (102 allocations: 2.08 KiB)

I looked at the @code_llvm output of both calls and they seem to be identical, so is this som kind of a benchmarking issue?

simple.llvm is coming from @code_llvm foo(data)
parametric.llvm is coming from @code_llvm foo(data, Int32)

░ tamasgal@greybox.local:~/tmp/wtf took 3s
░ 16:56:31 > diff simple.llvm parametric.llvm
1,2c1,2
< ;  @ REPL[45]:2 within `foo'
< define nonnull %jl_value_t addrspace(10)* @japi1_foo_18517(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 {
---
> ;  @ REPL[44]:2 within `foo'
> define nonnull %jl_value_t addrspace(10)* @japi1_foo_18685(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 {
26c26
< ;  @ REPL[45]:3 within `foo'
---
> ;  @ REPL[44]:3 within `foo'
30c30
< ;  @ REPL[45]:5 within `foo'
---
> ;  @ REPL[44]:5 within `foo'
170c170
< ;  @ REPL[45]:7 within `foo'
---
> ;  @ REPL[44]:7 within `foo'
174c174
< ;  @ REPL[45]:5 within `foo'
---
> ;  @ REPL[44]:5 within `foo'

Any ideas? The performance regression is real in my code, I clearly see a huge difference in allocations and elapsed time when I pass in the types.

Explicitly forcing specialization on T fixes the issue for me:

julia> function foo(io, ::Type{T}) where {T}
           out = Vector{T}()
           sizehint!(out, 100)
           for i in 1:100
               push!(out, read(io, T))
           end
           out
       end
foo (generic function with 2 methods)

And, by the way, you want to set evals = 1 in your benchmark code to ensure that the function is only run once per setup call:

julia> @btime foo(x) setup=(x = IOBuffer(rand(UInt8, 1000))) evals=1;
  485.000 ns (2 allocations: 528 bytes)

julia> @btime foo(x, T) setup=(x = IOBuffer(rand(UInt8, 1000)); T=Int32) evals=1;
  486.000 ns (2 allocations: 528 bytes)

Otherwise the function will be evaluated many times per setup call, which is why you’re hitting EOF (ref: BenchmarkTools setup isn't run between each iteration? - #6 by rdeits) .

4 Likes

Ah, thanks that was quick. It’s fairly interesting that this solves the issue. Do you know the reason? To me it still doesn’t make much sense :wink:

But I realise now I have a few places where I was “lazy” and did foo(T::Type) instead of foo(::Type{T}) where {T}…

I don’t know all the details, but I know that there are particular cases where the compiler is allowed to skip specializing on the concrete type of an argument (it still generates correct code, but potentially slower code). This is intended to help reduce compilation times in cases where it commonly won’t affect performance significantly. I think arguments of type ::Function are one such case, and ::Type may be another. Hopefully someone can chime in with the exact rules (I googled around but couldn’t find the relevant manual page).

The issue is mentioned tangentially here: Extend at-specialize to force specialization of arguments ¡ Issue #33978 ¡ JuliaLang/julia ¡ GitHub

3 Likes

I went down the rabbit hole in issue #33978 linked by @rdeits and it eventually led me to this section of the Performance Tips:

https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing-1

4 Likes

Perfect, many thanks! Wow I totally missed this quite fundamental thing…

No worries, it looks like that section of the Performance Tips was added pretty recently:

2 Likes