Performance discrepancy with multiple dispatch

Albert_de_montserrat · April 27, 2024, 7:47am

I am having problems trying to understand the difference in performance between the two foo methods described below.

using Chairmarks

abstract type AbstractTrait end
struct ConcreteTrait <: AbstractTrait end
foo(::Array) = ConcreteTrait()
foo(::Type{Array{T,N}}) where {T,N} = ConcreteTrait()

and given

x=[1]
T=typeof(x)
@assert foo(x) == foo(T) # true

the benchmarks show a large difference in performance:

julia> @b foo($x)
0.849 ns

julia> @b foo($T)
64.840 ns

What am I missing here? From what I see, both methods also seem to produce the same LLVM code.

nsajko · April 27, 2024, 8:23am

I’d guess this is a Chairmarks.jl bug (or misuse?), perhaps related to the fact that Julia doesn’t specialize for type arguments by default: Be aware of when Julia avoids specializing. @Lilith

Simpler example:

julia> using Chairmarks

julia> struct ConcreteTrait end

julia> foo(::Array) = ConcreteTrait()
foo (generic function with 1 method)

julia> foo(::Type{<:Array}) = ConcreteTrait()  # the performance is even worse now than when specializing the method for `T` and `N`
foo (generic function with 2 methods)

julia> x = [1]
1-element Vector{Int64}:
 1

julia> T = typeof(x)
Vector{Int64} (alias for Array{Int64, 1})

julia> @b foo($x)
1.286 ns

julia> @b foo($T)
325.369 ns

julia> @b x foo
1.286 ns

julia> @b T foo
333.884 ns

Albert_de_montserrat · April 27, 2024, 8:47am

Thanks! I guess it has to do with specialization as you say. I assumed that since this specializes:

julia> T=Array{Float64,2}
Matrix{Float64} (alias for Array{Float64, 2})

julia> foo2(::Type{T}) where T = ConcreteTrait()
foo2 (generic function with 3 methods)

julia> @b foo2($T)
0.852 ns

my original foo would too.

And it doesn’t seem an issue with Chairmarks. I get the same with BenchmarkTools as well

julia> @btime foo($T)
  54.864 ns (0 allocations: 0 bytes)

Lilith · April 27, 2024, 7:46pm

From what I see, both methods also seem to produce the same LLVM code.

@code_llvm lies about specialization:

Note that @code_typed and friends will always show you specialized code, even if Julia would not normally specialize that method call. You need to check the method internals if you want to see whether specializations are generated when argument types are changed, i.e., if Base.specializations(@which f(…)) contains specializations for the argument in question.

(source)

However, that’s not what’s causing the issue.

Chairmarks tells the compiler the types (and only the types) of pipeline arguments and interpolated values. For example, if you are benchmarking @b 7 cbrt, it will tell the compiler that you want to cube root an integer. @b cbrt($7) is the same. If you literally interpolate the 7 (e.g. @b cbrt(7)) then the compiler will know the type and value of that argument, in which case it can constant propagate and effectively compute @b _ -> 1.9129311827723892 instead. If you want true interpolation without writing it out yourself, you can use @eval. For example x = 7; @eval @b cbrt($x) is the same as @b cbrt(7).

In the case of @b foo($T), the type of T is DataType so the compiler does not know whether foo will dispatch to ConcreteTrait() or throw a method error. foo2, on the other hand, always returns ConcreteTrait() when passed a DataType so it is fast even though all the compiler knows about T is that it’s a DataType.

Perhaps Chairmarks should tell the compiler the exact value of interpolated values and pipeline arguments when they are types? But then I don’t know how one would opt out of that behavior while the current behavior can be avoided with @eval interpolation. I’m open to feedback and/or changing the behavior on this edge case if folks have ideas.

Aside @b foo(T) reports a fast runtime as well, which means chiarmarks is claiming that a function defined at global scope which calls foo(T) will run quickly, even though T is a non-constant global. This seems implausible to me but…

julia> g() = foo(T)
g (generic function with 1 method)

julia> @b g
1.134 ns

julia> @btime g()
  1.083 ns (0 allocations: 0 bytes)
ConcreteTrait()

…I guess there’s a back-edge or something?

Benny · April 27, 2024, 11:07pm

Can you check what happens when you add more methods to foo (>=5) with different return types? The compiler sometimes examines the return types of all of a function’s few methods for type inference; in this case it’s possible it noticed that the 2 methods of foo return a singleton ConcreteTrait instance with no side effects, allowing elision of the typical runtime dispatch and calls.

Topic		Replies	Views
Performance issue with use of eltype()? General Usage performance	7	1059	September 7, 2017
Performance of (a::AbstractType)(args...) function declarations Performance	3	484	October 12, 2019
Counter-intuitive performance difference Performance question	7	320	March 30, 2025
Huge difference between passing a type, or using a hardcoded-type (in benchmarks)? Performance	6	791	May 1, 2020
Bad performance for dispatch-heavy code Performance question	12	1669	September 20, 2019

Performance discrepancy with multiple dispatch

Related topics