Huge difference between passing a type, or using a hardcoded-type (in benchmarks)?

tamasgal · May 1, 2020, 3:04pm

I do not understand why there is a 30x difference between these two implementations. I guess I am doing some dumb (benchmarking) mistake because the produced LLVM code seems to be equal.

Note that I am using a large buffer to avoid EOF inside the benchmark loop. I could not get it to work with @btime foo(x) setup=(x=IOBuffer(rand(UInt8, 10000))) as it always throws an EOF error, no matter how large the buffer is.

julia> function foo(io)
           out = Vector{Int32}()
           sizehint!(out, 100)
           for i in 1:100
               push!(out, read(io, Int32))
           end
           out
       end
foo (generic function with 2 methods)

julia> function foo(io, T::Type)
           out = Vector{T}()
           sizehint!(out, 100)
           for i in 1:100
               push!(out, read(io, T))
           end
           out
       end
foo (generic function with 2 methods)

julia> data = IOBuffer(rand(UInt8, 1000000000));

julia> @btime foo($data);
  848.355 ns (2 allocations: 528 bytes)

julia> data = IOBuffer(rand(UInt8, 1000000000));

julia> @btime foo($data, $Int32);
  24.893 μs (102 allocations: 2.08 KiB)

I looked at the @code_llvm output of both calls and they seem to be identical, so is this som kind of a benchmarking issue?

simple.llvm is coming from @code_llvm foo(data)
parametric.llvm is coming from @code_llvm foo(data, Int32)

░ tamasgal@greybox.local:~/tmp/wtf took 3s
░ 16:56:31 > diff simple.llvm parametric.llvm
1,2c1,2
< ;  @ REPL[45]:2 within `foo'
< define nonnull %jl_value_t addrspace(10)* @japi1_foo_18517(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 {
---
> ;  @ REPL[44]:2 within `foo'
> define nonnull %jl_value_t addrspace(10)* @japi1_foo_18685(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 {
26c26
< ;  @ REPL[45]:3 within `foo'
---
> ;  @ REPL[44]:3 within `foo'
30c30
< ;  @ REPL[45]:5 within `foo'
---
> ;  @ REPL[44]:5 within `foo'
170c170
< ;  @ REPL[45]:7 within `foo'
---
> ;  @ REPL[44]:7 within `foo'
174c174
< ;  @ REPL[45]:5 within `foo'
---
> ;  @ REPL[44]:5 within `foo'

Any ideas? The performance regression is real in my code, I clearly see a huge difference in allocations and elapsed time when I pass in the types.

rdeits · May 1, 2020, 3:11pm

Explicitly forcing specialization on T fixes the issue for me:

julia> function foo(io, ::Type{T}) where {T}
           out = Vector{T}()
           sizehint!(out, 100)
           for i in 1:100
               push!(out, read(io, T))
           end
           out
       end
foo (generic function with 2 methods)

And, by the way, you want to set evals = 1 in your benchmark code to ensure that the function is only run once per setup call:

julia> @btime foo(x) setup=(x = IOBuffer(rand(UInt8, 1000))) evals=1;
  485.000 ns (2 allocations: 528 bytes)

julia> @btime foo(x, T) setup=(x = IOBuffer(rand(UInt8, 1000)); T=Int32) evals=1;
  486.000 ns (2 allocations: 528 bytes)

Otherwise the function will be evaluated many times per setup call, which is why you’re hitting EOF (ref: BenchmarkTools setup isn't run between each iteration? - #6 by rdeits) .

tamasgal · May 1, 2020, 3:17pm

Ah, thanks that was quick. It’s fairly interesting that this solves the issue. Do you know the reason? To me it still doesn’t make much sense

But I realise now I have a few places where I was “lazy” and did foo(T::Type) instead of foo(::Type{T}) where {T}…

rdeits · May 1, 2020, 3:26pm

I don’t know all the details, but I know that there are particular cases where the compiler is allowed to skip specializing on the concrete type of an argument (it still generates correct code, but potentially slower code). This is intended to help reduce compilation times in cases where it commonly won’t affect performance significantly. I think arguments of type ::Function are one such case, and ::Type may be another. Hopefully someone can chime in with the exact rules (I googled around but couldn’t find the relevant manual page).

The issue is mentioned tangentially here: Extend at-specialize to force specialization of arguments · Issue #33978 · JuliaLang/julia · GitHub

CameronBieganek · May 1, 2020, 5:51pm

I went down the rabbit hole in issue #33978 linked by @rdeits and it eventually led me to this section of the Performance Tips:

https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing-1

tamasgal · May 1, 2020, 6:27pm

Perfect, many thanks! Wow I totally missed this quite fundamental thing…

CameronBieganek · May 1, 2020, 6:49pm

No worries, it looks like that section of the Performance Tips was added pretty recently:

Topic		Replies	Views
Extra allocation with `T::DataType`? Performance	17	583	August 19, 2022
Performance issue due to function as an argument General Usage question , performance	16	843	September 22, 2023
Performance discrepancy with multiple dispatch Performance benchmark	4	375	April 27, 2024
Different behavior for same `@code_llvm` for voxel traversal Performance question , performance , debug	14	519	November 20, 2021
Understanding specialized methods for field access General Usage question , type , struct	7	354	February 7, 2023

Huge difference between passing a type, or using a hardcoded-type (in benchmarks)?

Related topics