Benchmarking surprises with type assertions

While looking at another post I got surprised by the following differences when using type assertions:

using Chairmarks

tup = (10, 100im, 1.0, 2//3, 3.0im, 2//3+im)
idx = 5
type = typeof(tup[idx])

function test1(tup, idx, type::Type{T}) where T
    idx == 1 && return tup[1]::type
    test1(Base.tail(tup), idx-1, type)

# Just replace T with type in the function body:
function test2(tup, idx, type::Type{T}) where T
    idx == 1 && return tup[1]::T
    test2(Base.tail(tup), idx-1, T)

# Remove the type assertion:
function test3(tup, idx)
    idx == 1 && return tup[1]
    test3(Base.tail(tup), idx-1)

julia> @b test1($tup, $idx, $type)
19.468 ns (2.01 allocs: 64.170 bytes)

julia> @b test2($tup, $idx, $type)
147.246 ns (2.04 allocs: 145.173 bytes)

julia> @b test3($tup, $idx)
8.415 ns (1.00 allocs: 32.077 bytes)

Which leads to two questions:

  1. For a function that takes a parameter type::Type{T}, is it expected to have much worse performance when using T instead of type in the function body?
  2. Is it expected that a type assertion allocates? Or where’s the additional allocation coming from in the first two cases?

Note that I find the same results with BenchmarkTools.jl:

using BenchmarkTools

julia> @btime test1($(Ref(tup))[], $(Ref(idx))[], $(Ref(type))[]);
  16.249 ns (2 allocations: 64 bytes)

julia> @btime test2($(Ref(tup))[], $(Ref(idx))[], $(Ref(type))[]);
  108.330 ns (2 allocations: 144 bytes)

julia> @btime test3($(Ref(tup))[], $(Ref(idx))[]);
  7.399 ns (1 allocation: 32 bytes)

I guess that dereferencing a type prevents type inference, causing run time dispatch.

I used that trick to prevent constant propagation/folding (as suggested at the bottom of this doc page). I’m not sure of the consequences but I don’t think that’s the problem here: I don’t deference when using Chairmarks, and still get the same results. And with BenchmarkTools, the deference is made in all cases, so it can’t explain the different behaviors?

julia> Ref(Int)

Notice the type parameter is DataType, not Type{Int64}.

Only the first two benchmark calls feature a type reference.

In the case of type values or values of a singleton type, though, you most probably do want type inference/constant propagation/folding. The idea is that a benchmark needs to be representative of “production” code.

1 Like

I commented again back on that thread, but suffice to say that type assertions may patch up type instabilities but doesn’t necessarily eliminate performance costs, especially since it seems to have more to do with tuple size than type inference.

I don’t think there are consequences generally, but type and T are indeed different. T is a method’s static parameter, not a local variable:

julia> function foo(type::Type{T}) where T
         T = Ref{type}
ERROR: syntax: local variable name "T" conflicts with a static parameter
 [1] top-level scope
   @ REPL[79]:1

julia> function foo(type::Type{T}) where T
         type = Ref{type}
foo (generic function with 1 method)

The inability to reassign static parameters make them much easier to treat as compile-time constants.

Agreed, so it all depends what you want to test. Here I’m interested in the performance of a function that receives as parameter a type that is known at execution time but not compile time…

Right, but then wouldn’t you expect better performance with T than with type?

Here with test1 which uses type I get

julia> @b test1($tup, $idx, $type)
19.468 ns (2.01 allocs: 64.170 bytes)

and with test2 which uses T I get

julia> @b test2($tup, $idx, $type)
147.246 ns (2.04 allocs: 145.173 bytes)

Not better but I’d expect it to be the same, as @code_warntype recognizes type as a compile-time constant. @code_llvm and @code_native report matching code. Those are known to specialize on arguments they shouldn’t, but I’m uncertain if that’s the problem here. The methods(...).specializations also list the same call signatures.

Maybe this is a benchmarking artifact that Chairmarks and BenchmarkTools both happen to share? This is what happens when I just @time a big loop:

julia> let tup=tup, idx=idx, type=type
         @time for _ in 1:10^7
           test1(tup, idx, type)
  4.690596 seconds (10.00 M allocations: 305.176 MiB, 0.24% gc time)

julia> let tup=tup, idx=idx, type=type
         @time for _ in 1:10^7
           test2(tup, idx, type)
  4.660442 seconds (10.00 M allocations: 305.176 MiB, 0.14% gc time)

Adding @noinline doesn’t make a significant change to runtime and doesn’t change the allocation report. As you can see, no order-of-magnitude differences, and the average call iteration is different: ~470ns, 1 allocation, 32 bytes.


FTR, if the parameter isn’t known at compile time, it’s often the best for performance not to specialize on it, so as to prevent run time dispatch.

Also, it’s not clear if you really want a type assertion at every step of the recursion, instead of just at the end.

Good point, I think I was confused when talking of “compile time”. As I understand doing f(::T) where T requires specialization so the type cannot be “unknown at compile time”. Instead the compilation will be delayed to occur during runtime, based on runtime dispatch.

In the example, the code path with the assertion is only used at the end of the recursion.