`@btime` appears to force specialization, does this give misleading timings?

Benny · January 19, 2021, 2:45pm

Julia normally avoids specializing on Type, but it can be triggered by @code_warntype and similar macros (has a mention in Performance Tips; currently open issue 32834; illustrated well in issue 23749). It also appears that @btime does the same thing:

julia> using BenchmarkTools
julia> using MethodAnalysis
julia> f(x::Type) = fieldnames(x)
f (generic function with 1 method)

julia> methodinstances(f)
Core.MethodInstance[]

julia> @btime f(Int8)
  45.702 ns (0 allocations: 0 bytes)
()

julia> methodinstances(f) # huh that's odd, it shouldn't specialize
1-element Array{Core.MethodInstance,1}:
 MethodInstance for f(::Type{Int8})

julia> f(Int8)
()

julia> methodinstances(f) # so a normal call uses the unspecialized version
2-element Array{Core.MethodInstance,1}:
 MethodInstance for f(::Type{Int8})
 MethodInstance for f(::Type{T} where T)

julia> f(Int16)
()

julia> methodinstances(f) # yep, normal call still uses the unspecialized version
2-element Array{Core.MethodInstance,1}:
 MethodInstance for f(::Type{Int8})
 MethodInstance for f(::Type{T} where T)

julia> @btime f(Int32)
  46.457 ns (0 allocations: 0 bytes)
()

julia> methodinstances(f) # and a @btime call causes specialization
3-element Array{Core.MethodInstance,1}:
 MethodInstance for f(::Type{Int8})
 MethodInstance for f(::Type{Int32})
 MethodInstance for f(::Type{T} where T)

As mentioned in the issues and illustrated by my example, normal calls use the unspecialized MethodInstance, even if a specialized one was already created by a @btime call. So I’m wondering if @btime is timing the specialized MethodInstance instead of the unspecialized one. It may not make a difference for my simple example method, but it could for more complicated methods.
P.S. also applies to @time

mbauman · January 19, 2021, 8:26pm

This isn’t really method specialization but rather inlining with constant propagation.

That is, there’s not really a f(::Type{Int8}) method instance compiled. What there is, however, is a benchmark kernel with a hardcoded/inlined () specifically for ::Type{Int8}. You also see this behavior with a simple outer function:

julia> f(x::Type) = fieldnames(x)
f (generic function with 1 method)

julia> methodinstances(f)
Core.MethodInstance[]

julia> g() = f(Int8)
g (generic function with 1 method)

julia> g()
()

julia> methodinstances(f)
1-element Array{Core.MethodInstance,1}:
 MethodInstance for f(::Type{Int8})

In other words, that method instance only exists to track the backedge to invalidate g() if need be:

julia> methodinstances(f)[1].backedges
1-element Array{Any,1}:
 MethodInstance for g()

This is behaving exactly as I’d expect — the core mantra of BenchmarkTools is to “benchmark the snippet as though it were written directly in a function.” In this case, were you to hardcode Int8 like that, you’d see exactly this performance.

Now, I initially thought that using $ to properly flag the Int8 as an argument instead of a hardcoded constant literal — that is, @btime f($Int8) — would make this behave as you initially expected. But we see the same behavior. The benchmark kernel must indeed be forcing specialization on its arguments internally… and that really does make sense because it needs to properly pass types to functions that are themselves properly specialized (e.g., @btime round($Int8, 2.3)). I don’t really know of a way around that.

One way to see that this is indeed all stemming from inlining is with @noinline:

# new session
julia> using MethodAnalysis

julia> @noinline f(x::Type) = fieldnames(x)
f (generic function with 1 method)

julia> g() = f(Int8)
g (generic function with 1 method)

julia> methodinstances(f)
Core.MethodInstance[]

julia> g()
()

julia> methodinstances(f)
2-element Array{Core.MethodInstance,1}:
 MethodInstance for f(::Type{Int8})
 MethodInstance for f(::Type{T} where T)

Now we actually create the pessimized f(::Type) specialization. We still do also have a Type{Int8} method instance — and I think that’s because it hardcoded which method of f should be called into g() — and that hardcoding of the dispatch itself may need to be invalidated. Again, with a “ghost” method instance that doesn’t actually contain a function pointer:

julia> methodinstances(f)[1].cache.invoke
Ptr{Nothing} @0x0000000000000000

julia> methodinstances(f)[2].cache.invoke
Ptr{Nothing} @0x0000000109e3b490

Benny · January 21, 2021, 3:24am

I have a few questions about this.

I don’t know what a “benchmark kernel” or “ghost method” is. From the surrounding context, I thought it might be a way to jot down that f(Int8) returned (). However, that sounds like a method specialization to me, which you had said it wasn’t really.
I see the logic of benchmarking a call as if it were written directly in a function because that’s usually where it happens, but why doesn’t this inlined ghost method thing happen for arguments that aren’t types? The following code example is your example but with a f(x::Integer) instead.
I don’t know how .cache.invoke indicates ghost methods. Is it because it’s a Ptr{Nothing}? Or is it just the one with trailing zeros? The last line of my example was both.

julia> f(x::Integer) = x+1
f (generic function with 1 method)

julia> methodinstances(f)
Core.MethodInstance[]

julia> g() = f(1)
g (generic function with 1 method)

julia> g()
2

julia> methodinstances(f)
1-element Array{Core.MethodInstance,1}:
 MethodInstance for f(::Int64)

julia> methodinstances(f)[1].cache.invoke
Ptr{Nothing} @0x0000000000000000

mbauman · January 21, 2021, 4:53am

These are really good questions:

“Kernel” is generally a small part of a computation. In the case of benchmarking, BenchmarkTools will put whatever you write — the kernel — inside a for loop to run it a bunch of times, gather statistics, etc, etc. Something like:
```
@timed for _ in 1:num_benchmark_iterations
    # THE KERNEL (what you wrote)
end
```
Note that this itself is inside another function! That outer function will also get specialized and compiled. That’s what that g function I used above is simulating. As part of its specialization, it’ll inline f.
Inlining takes a function defining — in this case that f method — and just hardcodes its definition directly into the caller function instead of compiling it separately and then calling it.
A “ghost MethodInstance” is a term I just made up on the fly here. Digging into MethodInstance internals is right at the edge of my swimming capabilities… we’re going into the deep end here. But I think its only purpose is to track that connection to the caller. It doesn’t actually point to a compiled function body — I think that’s what the null pointer in .cache.invoke represents. It’s not the Nothing that’s significant; it’s the 0x00000 that is. There might be a better way to see this, but it’s what I found in the few minutes I was playing with it.
Types are indeed funny because they are themselves DataTypes (a concrete leaf type) but can be uniquely specified with a Type{T} (a parameterized abstract supertype). That ::Int example is behaving exactly the same way.

Benny · January 21, 2021, 5:45am

So in the last example I gave with f(::Integer), if I had called f afterward, that pointer changes.

julia> methodinstances(f)[1].cache.invoke
Ptr{Nothing} @0x0000000000000000

julia> f(2)
3

julia> methodinstances(f)[1].cache.invoke
Ptr{Nothing} @0x000000002d66ce60

So it really demonstrates these “…000” MethodInstances exist so that method invalidation still works for inlined methods (g() would’ve been compiled to return a constant 2).

So to be clear, are you saying that I get an extra MethodInstance with the f(Int8) call but not the f(1) call because types are in this weird spot where Type{Int64} <: DataType even though typeof(Int64) === DataType?
I also don’t quite understand why the one for Type{Int8} shows up in your example even when you specified @noinline. Just seems like an unnecessary backedge when there already is one for Type{T}.

kristoffer.carlsson · January 21, 2021, 8:41am

Yes, see force specialization on arguments to core wrapper by KristofferC · Pull Request #124 · JuliaCI/BenchmarkTools.jl · GitHub (and the issue Overhead caused by widening · Issue #71 · JuliaCI/BenchmarkTools.jl · GitHub).

Topic		Replies	Views
Performance issue due to function as an argument General Usage question , performance	16	844	September 22, 2023
Reduce compilation time by avoiding specialization Internals & Design	4	1369	October 30, 2022
Performance discrepancy with multiple dispatch Performance benchmark	4	375	April 27, 2024
Huge difference between passing a type, or using a hardcoded-type (in benchmarks)? Performance	6	805	May 1, 2020
Confusing behavior of btime General Usage benchmarktools	12	1028	June 13, 2022

`@btime` appears to force specialization, does this give misleading timings?

Related topics