Experiments in speeding up runtime dispatch for integer value specialization

user664303 · November 18, 2022, 4:12pm

Interesting. Obviously it’s a complicated subject, and presumably one not easily tinkered with. I’m also new to Julia. But intuitively I feel that it shouldn’t be much slower than static languages at call time, as long as sufficient work is done at compilation time to update the data structures to make the finding the right function pointer fast.

user664303 · November 18, 2022, 4:18pm

Thank you. Very interesting.

I feel like this hides a lot of detail about why this means that dynamic dispatch must therefore be slow. Or perhaps the dispatch itself isn’t slow. Rather the extra code that comes with type instability caused by dynamic dispatch is slow. Is that it?

Ralph_Smith · November 18, 2022, 4:28pm

I think this is the crux of it.

user664303 · November 18, 2022, 4:37pm

Even when I declare the output type of my function, dynamic dispatch is still slow. So either dynamic dispatch is always type unstable, or something else is causing it to be slow.

function myfunc(::Val{N}) where N
    x = randn(SVector{N, Float64})
    return x' * x
end

function myfunc2(::Val{N})::Float64 where N
    x = randn(SVector{N, Float64})
    return x' * x
end

N = rand(1:32, 10000)
@btime foreach(n -> myfunc(Val(n)), $N)
@btime foreach(n -> myfunc2(Val(n)), $N)

julia> 3.529 ms (10000 allocations: 156.25 KiB)
julia> 3.523 ms (10000 allocations: 156.25 KiB)

Sukera · November 18, 2022, 6:07pm

That’s because annotating the return type does not get rid of the type instability in your anonymous function:

julia> g = n -> myfunc2(Val(n))
#15 (generic function with 1 method)

julia> @code_warntype g(1)
MethodInstance for (::var"#15#16")(::Int64)
  from (::var"#15#16")(n) @ Main REPL[23]:1
Arguments
  #self#::Core.Const(var"#15#16"())
  n::Int64
Body::Float64
1 ─ %1 = Main.Val(n)::Val
│   %2 = Main.myfunc2(%1)::Float64
└──      return %2

### this is what inference sees
julia> code_warntype(myfunc2, (Val,))
MethodInstance for myfunc2(::Val)
  from myfunc2(::Val{N}) where N @ Main REPL[2]:1
Static Parameters
  N <: Any
Arguments
  #self#::Core.Const(myfunc2)
  _::Val
Locals
  x::Any
Body::Float64
1 ─ %1 = Main.Float64::Core.Const(Float64)
│   %2 = Core.apply_type(Main.SVector, $(Expr(:static_parameter, 1)), Main.Float64)::Type{SVector{_A, Float64}} where _A
│        (x = Main.randn(%2))
│   %4 = Main.:var"'"(x)::Any
│   %5 = (%4 * x)::Any
│   %6 = Base.convert(%1, %5)::Any
│   %7 = Core.typeassert(%6, %1)::Float64
└──      return %7

which means myfunc2 can’t be inferred and everything has to be boxed everywhere. Your example is more or less trying to lift values to the type domain at runtime, which cannot be type stable.

lmiq · November 18, 2022, 6:35pm

This is another, perhaps simpler, example. Here the only thing that remains unstable is the Val type:

julia> using StaticArrays 
       using LinearAlgebra: norm
       myfunc2(::Val{N}) where N = norm(rand(SVector{N,Float64}))::Float64
       function d(v)
           s = 0.0
           for N in v
               s += myfunc2(Val(N))::Float64
           end
           s
       end
d (generic function with 1 method)

julia> @btime d($(rand(1:32, 10)))
  2.805 μs (21 allocations: 816 bytes)
22.02333320260226

julia> @code_warntype d(rand(1:32, 10))
MethodInstance for d(::Vector{Int64})
  from d(v) in Main at REPL[6]:3
Arguments
  #self#::Core.Const(d)
  v::Vector{Int64}
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  s::Float64
  N::Int64
Body::Float64
1 ─       (s = 0.0)
│   %2  = v::Vector{Int64}
│         (@_3 = Base.iterate(%2))
│   %4  = (@_3 === nothing)::Bool
│   %5  = Base.not_int(%4)::Bool
└──       goto #4 if not %5
2 ┄ %7  = @_3::Tuple{Int64, Int64}
│         (N = Core.getfield(%7, 1))
│   %9  = Core.getfield(%7, 2)::Int64
│   %10 = s::Float64
│   %11 = Main.Val(N)::Val
│   %12 = Main.myfunc2(%11)::Float64
│   %13 = Core.typeassert(%12, Main.Float64)::Float64
│         (s = %10 + %13)
│         (@_3 = Base.iterate(%2, %9))
│   %16 = (@_3 === nothing)::Bool
│   %17 = Base.not_int(%16)::Bool
└──       goto #4 if not %17
3 ─       goto #2
4 ┄       return s

The Val It can’t be known at compile time, of course, but it is interesting that it has to be boxes (is that what is boxed there?).

For comparison, one can easily write a simpler, type stable function, by not using Val there, but it becomes much slower and allocates much more because of the “hidden” boxing of the dynamic dispatch:

julia> using StaticArrays
       using LinearAlgebra: norm
       myfunc2(N) = norm(rand(SVector{N,Float64}))::Float64
       function d(v)
           s = 0.0
           for N in v
               s += myfunc2(N)::Float64
           end
           s
       end
d (generic function with 1 method)

julia> @btime d($(rand(1:32, 10)))
  9.850 μs (110 allocations: 6.44 KiB)
21.20694169117772

(@code_warntype is clean for this one).

Thus, I start to get the fact that for the first version to be faster, it would need to not allocate the Val type, which if the compiler knew that all are of type Val{Int here} seems in theory feasible.

In this specific case it seems that some progress could be made if we could do

struct MyVal{N::Int} end

such that the compiler would statically know the size of Val{N} at every iteration.

Sukera · November 18, 2022, 7:35pm

No, this is the exact same example as the one I posted above. In mine, Val is also the only “unstable” thing on the top level, but this forces the code inside of myfunc2 to be super unstable, because inference doesn’t know the concrete type.

There is no hidden dynamic dispatch on myfunc2 there. Both d and myfunc2 are already type stable. myfunc2 in particular is type stable, but not type grounded:

julia> code_warntype(myfunc2, (Int,))
MethodInstance for myfunc2(::Int64)
  from myfunc2(N) @ Main REPL[3]:1
Arguments
  #self#::Core.Const(myfunc2)
  N::Int64
Body::Float64
1 ─ %1 = Core.apply_type(Main.SVector, N, Main.Float64)::Type{SVector{_A, Float64}} where _A
│   %2 = Main.rand(%1)::SVector{_A, Float64} where _A
│   %3 = Main.norm(%2)::Any
│   %4 = Core.typeassert(%3, Main.Float64)::Float64
└──      return %4

The difference is subtle. For a function to be type stable, the only requirement is that the output type is inferrable from the input type (or a constant). This can be achieved with a type assert. For a function to be type grounded, EVERY expression in a function needs to be of a concrete type, which is not the case for myfunc2 - it again lifts a runtime value to the type domain, making it internally type unstable. This is the same as being type stable, but type ungrounded (i.e. the return type is known, but not all inner types are).

user664303 · November 18, 2022, 10:43pm

lmiq:

In this specific case it seems that some progress could be made if we could do
struct MyVal{N::Int} end
such that the compiler would statically know the size of Val{N} at every iteration.

StaticArrays.jl uses syntax like this to declare arrays of a fixed size, doesn’t it? So it must be possible. I looked at the source code but couldn’t understand how. And this would also be nice:

function myfunc{N::Int} end

Sukera · November 18, 2022, 10:45pm

They enforce it through the constructor. There is no way to enforce a type parameter to be of a specific type, since it doesn’t have a type - it’s a type parameter. The concepts are orthogonal (though admittedly a bit blurry…)

Topic		Replies	Views
How slow is runtime dispatch, anyway? Benchmark attempts Performance	8	326	January 15, 2025
Why is this code run-time dispatch/slow? Performance	16	881	September 25, 2020
Best approach for runtime dispatching inside a hot loop (heterogeneous tree structure) Performance	18	2751	March 3, 2018
Dispatch on argument values rather than (just) types? Performance	5	2910	December 26, 2017
If mutiple dispatch is done at compile time, wouldn't that make things faster? Performance multidispatch	3	505	May 23, 2022

Experiments in speeding up runtime dispatch for integer value specialization

Related topics