Performance when dispatching on type

A colleague recently showed me this example, and I’m curious as to the underlying reason for the performance hit.

using BenchmarkTools, StaticArrays

f(u, v) = u*v'

struct ConcType end

ctyp = ConcType
cvar = ConcType()

F(::Type{ConcType}, u, v) = f(u, v)
F(::ConcType, u, v) = f(u, v)

a = SVector(1.0,2.0)
b = SVector(1.0,2.0,3.0,4.0)


@btime f($a, $b) # 1.700 ns (0 allocations: 0 bytes)
@btime F($cvar, $a, $b) # 1.700 ns (0 allocations: 0 bytes)
@btime F($ctyp, $a, $b) # 122.707 ns (3 allocations: 160 bytes)

Hm, maybe benchmark artifacts?

using BenchmarkTools, StaticArrays

f(u, v) = u*v'

struct ConcType end

F(::Type{ConcType}, u, v) = f(u, v)
F(::ConcType, u, v) = f(u, v)

@btime f(a, b) setup = (a = SVector(1.0,2.0); b = SVector(1.0,2.0,3.0,4.0))
@btime F(cvar, a, b) setup = (cvar = ConcType(); a = SVector(1.0,2.0); b = SVector(1.0,2.0,3.0,4.0))
@btime F(ctyp, a, b) setup = (ctyp = ConcType; a = SVector(1.0,2.0); b = SVector(1.0,2.0,3.0,4.0))

results in

  1.200 ns (0 allocations: 0 bytes)
  1.200 ns (0 allocations: 0 bytes)
  1.200 ns (0 allocations: 0 bytes)
2 Likes

Interesting. So is it generally recommended to use the setup functionality instead of interpolating global variables?

1 Like

I believe benchmarks in global scope involving non-constants like ctyp and cvar are problematic, but I don’t know all the details and use setup at least in critical cases like this one.

3 Likes

Okay, thank you!