Why are functions that are never called getting compiled?

Previously I asked about, and solved, a problem with compilation of a function inside generated code - the problem was that multiple instances of a function were getting compiled without being called. I would like to avoid this. The solution, which I now realize has a time penalty, was to use the @nospecialize macro.

I now want to use a @generated function, so that I can set the upper and lower bounds programmatically. If you run this MWE, you’ll see that 32 function instances are compiled, but only 1 is called:

function valuedispatch_expr(::Val{lower}, ::Val{upper}, val, fun) where {lower, upper}
    if lower >= upper
        return :( return $fun(Val($upper)) ) 
    end
    midpoint = lower + div(upper - lower, 2)
    expr_a = valuedispatch_expr(Val(lower), Val(midpoint), val, fun)
    expr_b = valuedispatch_expr(Val(midpoint+1), Val(upper), val, fun)
    return quote
        if $val <= $midpoint
            $expr_a
        else
            $expr_b
        end
    end
end

macro valuedispatch_macro(lower::Int, upper::Int, val, fun)
    return valuedispatch_expr(Val(lower), Val(upper), esc(val), esc(fun))
end

@generated function valuedispatch(::Val{lower}, ::Val{upper}, val, fun) where {lower, upper}
    ex = :( @valuedispatch_macro($lower, $upper, val, fun) )
    return quote
        @nospecialize
        $ex
    end
end

@generated function myfun(::Val{v}) where v
    println("Compiling ", v)
    return :(v, println("Running ", $v))
end

println(valuedispatch(Val(1), Val(32), 3, myfun))

Does anyone have any ideas how I might fix this? How might I make sure that only the instance of myfun that gets compiled is the one that is actually called?

On a side note, I think this valuedispatch function will be really useful functionality in general, if I can iron out the issues. For example, in the following case it’s 33x faster than variable dispatch (on my machine):

function myfun2(::Val{v}) where v
    return v*v-v
end

@btime valuedispatch(Val(1), Val(8), Int(ceil(rand() * 8.)), myfun2)
@btime myfun2(Val(Int(ceil(rand() * 8.))))

I have a feeling that it has something to do with how val and fun are getting passed around, but I don’t know how to fix it.

Rather than use @nospeclialize, you can use Base.inferencebarrier in the generated function, as follows:

@generated function valuedispatch(::Val{lower}, ::Val{upper}, val, fun) where {lower, upper}
    return :( @valuedispatch_macro($lower, $upper, val, Base.inferencebarrier(fun)) )
end

However, this negates much of the speed advantage of valuedispatch. It becomes only 2x faster than variable dispatch, instead of 33x.

1 Like

I feel sure there is, or at least should be, a way to ensure that a function is only compiled when it is first called, without any undesirable side-effects with respect to type inference. I also suspect that a few people in this community will know how to achieve this - I’m just hoping that one of you see this! :smiley:

I’m having a tough time following how the expression is built or what the intent is, but it looks like you recursively build a large nested if statement branching to function calls on different Val types, so you start with myfun(::Val{1}) and myfun(::Val{32}) and snowball into all the Val in between like myfun(::Val{16}). Although only one of these calls run per function call, every single function call in the if statement is inferred and compiled when the surrounding function is compiled. These calls are not dynamically dispatched, and a function call is only compiled once, so it makes sense they all must be compiled when the surrounding function is compiled.

Here's a simpler example with just 2 branches
julia> foo(x) = x+1
foo (generic function with 1 method)

julia> (@which foo(nothing)).specializations
svec(MethodInstance for foo(::Nothing), nothing, nothing, nothing, nothing, nothing, nothing, nothing)

julia> function bar(b::Bool)
         if b
           foo(1)
         else
           foo(1.5)
         end
       end
bar (generic function with 1 method)

julia> (@which foo(nothing)).specializations
svec(MethodInstance for foo(::Nothing), nothing, nothing, nothing, nothing, nothing, nothing, nothing)

julia> bar(true)
2

julia> (@which foo(nothing)).specializations # both foo branches compiled with bar
svec(MethodInstance for foo(::Nothing), MethodInstance for foo(::Int64), MethodInstance for foo(::Float64), nothing, nothing, nothing, nothing, nothing)

In your previous post, you prevented the 32 compilations with @nospecialize. @nospecialize ends up in the body of function valuedispatch_1_32(val, fun), so it was applied to the arguments of that function, not to ex. The important bit there was @nospecialize fun; in fact you could’ve accomplished the same thing in the previous post by annotating that argument alone, and @nospecialize val alone would’ve done nothing. @nospecialize fun forced every fun call to be dynamically dispatched, so myfun was only compiled when it ran, not when the surrounding function was compiled.

There’s a couple reasons that @nospecialize is not working in @generated function valuedispatch here. The smaller issue is that @nospecialize is stuck in a quote block rather than being in the body of the function itself. The bigger issue is that generated functions build expressions at a point in the compilation process where the runtime arguments’ types are known, that’s their purpose and the types are even assigned to the argument variables. And despite moving @nospecialize fun into the definition header, it still knows typeof(myfun), I checked by adding a line println("Compiling ", fun). I don’t know if someone else has managed to do it, but I think generated functions are not capable of unspecialized arguments. Your Base.inferencebarrier code is the only workaround I’ve ever seen, and the generated function call is specializing on the fun argument even if its body isn’t.

Here's a simpler example of a generated function with a @nospecialize that doesn't work
julia> @generated function foo(@nospecialize x)
         println("Compiling...", x)
         :(x+1)
       end
foo (generic function with 1 method)

julia> foo(1)
Compiling...Int64
2

julia> foo(1.5)
Compiling...Float64
2.5

julia> (@which foo(1)).specializations
svec(MethodInstance for foo(::Int64), MethodInstance for foo(::Float64), nothing, nothing, nothing, nothing, nothing, nothing)

It makes sense because you went from branching to 1 of 32 statically dispatched function calls to 1 of 32 dynamically dispatched function calls. I’m not exactly sure why it’s 2x faster than @btime myfun2(...) when it seems to do extra work to branch to the same call. @btime only reports the minimum time, and a println on top of background processes can skew timings quite a bit. Maybe compare the timing distributions of @benchmark?

1 Like

Thank you, @Benny , for the detailed response! :heart:

It looks like you recursively build a large nested if statement branching to function calls on different Val types

Exactly right.

It makes sense because you went from branching to 1 of 32 statically dispatched function calls to 1 of 32 dynamically dispatched function calls.

Yes. That’s what I’d understood.

Although only one of these calls run per function call, every single function call in the if statement is inferred and compiled when the surrounding function is compiled.

Yes. I’d understood that.

These calls are not dynamically dispatched, and a function call is only compiled once, so it makes sense they all must be compiled when the surrounding function is compiled.

While I understand it happens, I don’t understand why it must happen. I can see ways that even a statically dispatched function can be compiled on it’s first call.

Often I see people in this community saying function specializations are compiled the first time they’re called. However, in fact this isn’t true. They’re compiled the first time that any function that may call it statically is compiled. This is quite different. And I wonder if it actually needs to, or must, be this way.

Yes this would be more accurate. I suppose it doesn’t roll off the tongue as easily as “upon first call”.

And yes it must happen. If we don’t compile a function call, we can’t optimize the caller function: 1) without inferring the call’s return type, we can’t allot memory for the output other than a boxed type, 2) subsequent code using the return value cannot assume its type, 3) cannot inline compiled code if it’s not compiled yet. This is pretty much how dynamically dispatched function calls work now. You could implement a language that compiles placeholders and recompiles each time a statically dispatched function call is ran, but the compilation latency could be so bad we’d be better off not recompiling. It’d also make precompilation really tricky because you’d have to infer and manually list all the nested function call signatures first.

3 Likes
  1. without inferring the call’s return type, we can’t allot memory for the output other than a boxed type, 2) subsequent code using the return value cannot assume its type,

In the cases that either the return value isn’t used, or the type of the return value is annotated in the definition of the called function, this shouldn’t be an issue. Easy to fix in my case.

  1. cannot inline compiled code if it’s not compiled yet.

Is there a specific @noinline macro or annotation?

You could implement a language that compiles placeholders and recompiles each time a statically dispatched function call is ran, but the compilation latency could be so bad we’d be better off not recompiling.

This sounds a bit like everything would need to be recompiled, rather than having just one function compiled. Does this mean everything is recompiled every time a dynamic dispatch triggers a compilation? A language that compiled just each function (and not everything) when the function was called would have to have a lower or equal compilation latency, since it would be compiling a lesser or equal amount of code.

@Benny Do you know that it isn’t possible in Julia? Or are you stating the reasons as to why you believe it shouldn’t be possible?

It’s not that simple. This isn’t like statically typed languages that enforce sensible types throughout the function, At some point, the types must still be inferred from the input types and memory has to be allotted for whatever was inferred. Julia’s return type annotations just do assertions (throw errors on mismatch) or conversions to the specified type; they don’t change how the call works, they just do stuff afterward.

Yes, actually. You don’t really have to worry about it usually, if a call is not inlineable, like a dynamic dispatch, then the compiler won’t do it even if you annotate @inline.

Exact opposite, I’m saying that code with a dynamic dispatch is compiled to accommodate any return type and function call. It wouldn’t make sense for them to cause recompilation of the caller function because the next call could be anything, unlike a statically dispatched call. I mentioned dynamic dispatch because this current behavior is close to what you’re suggesting for code with static dispatches. But that proposal introduces either the runtime performance penalties of dynamic dispatch or unprecedented recompilation latency e.g. the caller function valuedispatch being recompiled every time it reaches a branch for the first time and needs to compile a new myfun(::Val) call.