Improving speed of runtime dispatch detector

MilesCranmer · May 24, 2024, 6:45pm

I’m working on this small package DispatchDoctor – GitHub - MilesCranmer/DispatchDoctor.jl which helps to address some of the issues discussed in this thread.

This package provides the @stable macro as a more ergonomic way to use Test.@inferred within a codebase:

using DispatchDoctor: @stable

@stable function f(x)
    if x > 0
        return x
    else
        return 1.0
    end
end

which will then throw an error for any type instability:

julia> f(2.0)
2.0

julia> f(1)
ERROR: return type Int64 does not match inferred return type Union{Float64, Int64}
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] f(x::Int64)
   @ Main ~/PermaDocuments/DispatchDoctor.jl/src/DispatchDoctor.jl:18
 [3] top-level scope
   @ REPL[4]:1

I could see this being useful for maintaining type hygiene in a codebase – you see type instabilities early, rather than needing to fix things when code is already slow.

The @stable macro is pretty simple (using MacroTools)


function _stable(fex::Expr)
    fdef = splitdef(fex)
    closure_func = gensym("closure_func")
    fdef[:body] = quote
        let $(closure_func)() = $(fdef[:body])
            $(Test).@inferred $(closure_func)()
        end
    end

    return combinedef(fdef)
end

However, this @inferred call is quite slow – a massive 400ns per call.

Is there anything I can do to only trigger the Test.@inferred on the first call with the given set of input types? (Is my only option to use @generated?)

Here’s a benchmark:

julia> using DispatchDoctor: @stable

julia> @stable f(x) = x > 0 ? x : 0.0;

julia> @btime f(1.0);
  567.568 ns (12 allocations: 752 bytes)

julia> g(x) = x > 0 ? x : 0.0;

julia> @btime g(1.0);
  0.875 ns (0 allocations: 0 bytes)

Any tricks I should try?

Ideally I would like to have the Test.@inferred completely compiled away by the second run… Not sure if that’s possible or not.

nsajko · May 24, 2024, 7:12pm

I’d wager the stated problem is solvable, even without metaprogramming, however I don’t like the idea, as it would incur a heavy penalty on the first call, and it seems like it’d make using a debugger less nice.

IMO using @inferred in the test suite is preferable.

Elrod · May 24, 2024, 9:14pm

What about something like

julia> function stable_wrap(f::F, args...) where {F}
           T = Base.promote_op(f, map(typeof, args)...)
           Base.isconcretetype(T) || error("Not stable!")
           f(args...)::T
       end
stable_wrap (generic function with 1 method)

julia> macro stable(ex::Expr)
           fdef = ex.args[1]
           (Base.sym_in(ex.head, (:function,:(=))) && Meta.isexpr(fdef, :call)) || error("not a function")
           args = @view(fdef.args[2:end])
           for a in args
               a isa Expr && Base.sym_in(a.head, (:kw,:parameters)) && error("need to implement kwarg support")
           end
           fname = fdef.args[1]
           fdef.args[1] = gname = gensym(fname)
           quote
               $ex
               $fname(args...) = $stable_wrap($gname, args...)
           end |> esc
       end
@stable (macro with 1 method)

julia> @stable f(x,y) = x * y / 3
f (generic function with 1 method)

julia> f(2, 3)
2.0

julia> @code_typed f(2, 3)
CodeInfo(
1 ── %1  = Core.getfield(args, 1)::Int64
│    %2  = Core.getfield(args, 2)::Int64
└───       goto #10 if not true
2 ┄─ %4  = φ (#1 => 2, #9 => %16)::Int64
│    %5  = Base.sle_int(1, %4)::Bool
└───       goto #4 if not %5
3 ── %7  = Base.sle_int(%4, 2)::Bool
└───       goto #5
4 ──       nothing::Nothing
5 ┄─ %10 = φ (#3 => %7, #4 => false)::Bool
└───       goto #7 if not %10
6 ──       Base.getfield((Int64, Int64), %4, true)::DataType
│    %13 = Base.add_int(%4, 1)::Int64
└───       goto #8
7 ──       goto #8
8 ┄─ %16 = φ (#6 => %13)::Int64
│    %17 = φ (#6 => false, #7 => true)::Bool
│    %18 = Base.not_int(%17)::Bool
└───       goto #10 if not %18
9 ──       goto #2
10 ┄       goto #11
11 ─       goto #12
12 ─       goto #13
13 ─       goto #14
14 ─ %25 = Base.mul_int(%1, %2)::Int64
│    %26 = Base.sitofp(Float64, %25)::Float64
│    %27 = Base.div_float(%26, 3.0)::Float64
└───       goto #15
15 ─       return %27
) => Float64

julia> @code_llvm f(2, 3)

;  @ REPL[2]:12 within `f`
define double @julia_f_427(i64 signext %0, i64 signext %1) #0 {
top:
; ┌ @ REPL[1]:4 within `stable_wrap`
; │┌ @ REPL[3]:1 within `##f#226`
; ││┌ @ int.jl:88 within `*`
     %2 = mul i64 %1, %0
; ││└
; ││┌ @ int.jl:97 within `/`
; │││┌ @ float.jl:294 within `float`
; ││││┌ @ float.jl:268 within `AbstractFloat`
; │││││┌ @ float.jl:159 within `Float64`
        %3 = sitofp i64 %2 to double
; │││└└└
; │││ @ int.jl:97 within `/` @ float.jl:412
     %4 = fdiv double %3, 3.000000e+00
     ret double %4
; └└└
}

The idea is that the check compiles away.

MilesCranmer · May 24, 2024, 9:21pm

It would be quite tedious to explicitly test the inference over all possible permutations of types to every internal function in a library. Especially functions that are deeply nested, for which a failed inference may not be picked up by a top-level @inferred. Those methods which would require some manual @descend work are not practical for automation. But tagging it at the call site would let you automate this.

Anyways Im not looking to convince anyone of the utility at this stage. I hate type instabilities and I hate finding them, so I want to get this @stable faster so I can use it in my own stuff.

MilesCranmer · May 24, 2024, 9:22pm

Very nice!! Thanks!

xzackli · May 24, 2024, 9:31pm

I’ve always wanted something small and convenient like this! I’ve also seen a macro floating around for erroring on all allocations inside a macro-ed function, which could also live in such a package (combined into @static)?

MilesCranmer · May 24, 2024, 9:35pm

Sounds great. Let me know if you find that macro, I’d love to throw it in the package too

nsajko · May 24, 2024, 9:41pm

Hmm, you’re right. What about hiding this behavior behind a compile time preference, with Preferences.jl? This way it could be turned off for production but turned on in the test suite.

Zentrik · May 24, 2024, 9:55pm

JuliaLang/AllocCheck.jl: AllocCheck (github.com)? or have I misunderstood.

MilesCranmer · May 25, 2024, 11:05am

How does this sound for working with keywords? The downside is that it has to call the internal function Core.kwcall, but it seems like promote_op doesn’t define a keyword-compatible method:

function stable_wrap(f::F, args...; kwargs...) where {F}
    T = if isempty(kwargs)
        Base.promote_op(f, map(typeof, args)...)
    else
        Base.promote_op(Core.kwcall, typeof(NamedTuple(kwargs)), F, map(typeof, args)...)
    end
    Base.isconcretetype(T) || error("...")
    return f(args...; kwargs...)::T
end

Full implementation here: DispatchDoctor.jl/src/DispatchDoctor.jl at main · MilesCranmer/DispatchDoctor.jl · GitHub.

It seems to work for a variety of scenarios too which is great:

@testitem "smoke test" begin
    using DispatchDoctor
    @stable f(x) = x
    @test f(1) == 1
end
@testitem "with error" begin
    using DispatchDoctor
    @stable f(x) = x > 0 ? x : 1.0

    # Will catch type instability:
    @test_throws TypeInstabilityError f(1)
    @test f(2.0) == 2.0
end
@testitem "with kwargs" begin
    using DispatchDoctor
    @stable f(x; a=1, b=2) = x + a + b
    @test f(1) == 4
    @stable g(; a=1) = a > 0 ? a : 1.0
    @test_throws TypeInstabilityError g()
    @test g(; a=2.0) == 2.0
end
@testitem "tuple args" begin
    using DispatchDoctor
    @stable f((x, y); a=1, b=2) = x + y + a + b
    @test f((1, 2)) == 6
    @test f((1, 2); b=3) == 7
    @stable g((x, y), z=1.0; c=2.0) = x > 0 ? y : c + z
    @test g((1, 2.0)) == 2.0
    @test_throws TypeInstabilityError g((1, 2))
end

MilesCranmer · May 25, 2024, 12:06pm

Slightly related question… Does anybody know how to unit-test that the LLVM is as expected?

julia> using DispatchDoctor

julia> @stable f(x) = x
f (generic function with 1 method)

julia> @code_llvm f(1)
;  @ /Users/mcranmer/PermaDocuments/DispatchDoctor.jl/src/DispatchDoctor.jl:65 within `f`
define i64 @julia_f_460(i64 signext %0) #0 {
top:
  ret i64 %0
}

I can do this check manually but would prefer to have the CI scream at me when Julia no longer compiles away the check.

Zentrik · May 25, 2024, 12:40pm

using InteractiveUtils: code_llvm

llvm_ir = sprint(code_llvm, f, (Int,))
@test !occursin(str, llvm_ir)

Replace str with some ir that shows up when the check isn’t compiled away.
Plenty of examples at [Code search results (github.com)](Repository search results · GitHub and I imagine CUDA.jl, GPUCompiler.jl and LLVM.jl also have more examples.

MilesCranmer · May 25, 2024, 12:43pm

Amazing. Thanks!

(And btw do you foresee any issues with the use of Core.kwcall? I noticed it wasn’t compatible with earlier Julia, so I basically am just having @stable be a no-op on Julia earlier than 1.10)

Elrod · May 25, 2024, 1:23pm

See GitHub - JuliaTesting/PerformanceTestTools.jl
It takes care of getting rid of flags like --check-bounds=yes and code coverage.

Here is an example use:

github.com

YingboMa/FastBroadcast.jl/blob/5fb772af362a949a1277df2ed4122b8022037701/test/runtests.jl#L126


      
                  end
                  @test FastBroadcast.indices_do_not_alias(typeof(view(fill(0, 10), 1:4)))
          
                  let ex = macroexpand(@__MODULE__,
                          :(@.. broadcast=false @view(J[idxs])=@view(J[idxs]) - inv_alpha))
                      @test Base.Meta.isexpr(ex, :call)
                      @test ex.args[1] === FastBroadcast.fast_materialize!
                  end
              end
          
              VERSION >= v"1.6" && PerformanceTestTools.@include("vectorization_tests.jl")
          end
          
          if GROUP == "Downstream"
              activate_downstream_env()
          end

and from the included file

github.com

YingboMa/FastBroadcast.jl/blob/5fb772af362a949a1277df2ed4122b8022037701/test/vectorization_tests.jl#L54-L56C5


      
          InteractiveUtils.code_llvm(io, foo9, Base.typesof(a, b, c, d, e, f, g, h, i))
          str = String(take!(io))
          @test occursin("vector.body", str)

vector.body is a name LLVM typically gives to vectorized loop bodies, so this code checks to make sure a gigantic broadcast vectorized.

You could do things like add the debuginfo=:none kwarg to code_llvm, and then check for number of lines.
Or for totally trivial cases, you could try things like comparing string distance with what the optimized IR is supposed to be like (with debuginfo=:none of course; e.g. we don’t are about LineNumberNode paths matching).

EDIT: should maybe replace the String(take!(io)) from FastBroadcast’s tests with sprint.

MilesCranmer · May 25, 2024, 1:52pm

Nice! That worked. Thanks for the help, I think this is ready for the registry now.

Palli · May 25, 2024, 2:34pm

It’s great that you done this, could be nerdsniped to do it, so maybe you or someone else can be nersniped to make improvements building on this. I.e. apply one or more macros globally, like a REPL mode that could for your f do implicitly:

stable> @stable @check_allocs f(x) = <my_function>

i.e. you wouldn’t need to specify those there, in that mode, only in the regular julia prompt, that you would no longer use most of the time.

[We already have a package for checked arithmetic; and a package for a REPL more that enables it, and we could have the above REPL mode include that, and call it debug…]
Ideally all functions (you care about) would be type-stable (and non-allocating if important), but it’s a learning curve, I think can’t be checked at compile time for arbitrary types. Your example relu code isn’t type-stable, since it used 0.0, should use zero(x) to also work for e.g. Float32; and one(x) where applies, and division / (and I guess \) give Float64, another stability trap.

Would you want to check for such to have type-stability at compile time, for most or all generic code? Often it’s ok to know type-stable for the types I use at runtime. You merged a credit for a perfomance trick minutes ago, is this now no overhead if the code is type-stable (for some types, but not then you get a type-instability error)?

MilesCranmer · May 25, 2024, 3:07pm

Yeah it should be zero overhead now. I have a unittest for this too.

MilesCranmer · May 25, 2024, 3:53pm

One other thing that would be useful would be a module-wide version:

@stable module A
  
function f1(x)
    x
end
function f2(x, y)
    x * y
end

end

and it would add @stable to every function in-scope.

I’m assuming this isn’t possible though…

Elrod · May 25, 2024, 4:29pm

The hard part would be include.
At that point, it may be worth trying to play with Core.Compiler/inference instead, to see if you can create a module-level Base.Experimental.@ option like @optlevel or @max_methods.

MilesCranmer · May 25, 2024, 4:39pm

Thanks.

Btw, I found a weird case of Julia’s specialization rules interfering with this interface:

using DispatchDoctor

@stable f(a, t::Type{T}) where {T} = sum(a; init=zero(T))

f([1f0, 1f0], Float32)

Despite the normal function being type stable, this actually fails the type specialization test

ERROR: TypeInstabilityError: Instability detected in function `f`
with arguments `(Vector{Float32}, DataType)`. Inferred to be 
`Any`, which is not a concrete type.
Stacktrace:
 [1] #_stable_wrap#1
   @ ~/PermaDocuments/DispatchDoctor.jl/src/DispatchDoctor.jl:25 [inlined]

because of Julia’s type specialization rules:

As a heuristic, Julia avoids automatically specializing on argument type parameters in three specific cases: Type, Function, and Vararg.

Even if I modify _stable_wrap to be

_stable_wrap(f::F, caller::G, args::Vararg{Any,N}; kwargs...) where {F,G,N}

it still fails, because now there are multiple non-specializing cases (Vararg and Type) – it seems like Julia lacks logic to deal with this situation.

I started a thread about this issue a while back:

but seems like that solution doesn’t work here.

Is there any way to force Julia to specialize no matter what?

Topic		Replies	Views
[ANN] DispatchDoctor.jl :stethoscope: – offers you a prescription for type stability Package Announcements	45	2621	June 4, 2024
Ominous type instability Performance bug , code_warntype , type-stability	10	1219	August 20, 2022
Suggestion: introduce something like a @inferable or @typestable macro to the language Internals & Design	30	1414	August 21, 2022
"A Tragedy of Julia’s Type System" Offtopic	33	2695	January 13, 2025
Tutorial on using advanced type system in Julia? New to Julia	28	3088	January 27, 2020

Improving speed of runtime dispatch detector

Related topics