Anonymous functions, type stability

For my simulation, I want the user to easily specify a bunch of observable functions like

observables = [(x,p,force) -> x[1], (x,p,force) -> x[2], (x,p,force) -> p'p]

that are then passed to the simulation and evaluated repeatedly in the main loop.

Getting to know the ProfileView flamegraphs and the Cthulhu descend function, I discovered that this seems to be type-unstable. Cthulhu shows code like

observables[i]::Function(x, p, force)::Any)::Any  ## "Any" are red 

How can I solve this issue? The x, p, force are vectors of floats where the user determines the dimensionality of the problem. However, the initial values in these vectors in the simulations are often integers. x, for example, is a position vector that is evolved in the simulation and will usually contain floats, but the user might specify its initial value with x=[0], not caring about type issues.

1 Like

GitHub - yuyichao/FunctionWrappers.jl sounds like what you want.

1 Like

You just need to make sure the types of all functions is known and stable once it hits your core code where the loop is (e.g. a function barrier). If the vector is short, make it a tuple. Then iterate over the tuple with map, reduce or anything else recursive when you are calling them.

If you are choosing the function from the list with a runtime Int in a loop, you will need FunctionWrappers.jl. But otherwise you dont.

6 Likes

Thanks for the advice, could you elaborate a little?
Indeed, my main loop looks like this:

    for i in 0:maxIter

        # measurement
        if i%t_meas == 0
             for k in eachindex(observables)
                result_sum[k] .+= observables[k](x,p,force)
             end
        end

        ## update x-vector, p-vector, and force-vector

    end   

Should I transform the vectors to tuple before the measurement, or use tuples from the get go? They are immutable, so updating them could be more challenging.
Also, I don’t know what a function barrier is…

See Performance Tips · The Julia Language

I think Rafael is proposing some variation of

julia> function sim(observables; max_iters=10)
         result_sum = zeros(length(observables))
         for i in 1:max_iters
           x = rand(SVector{2})
           result_sum .+= map(ob->ob(x), observables)
          end
          result_sum
        end
sim (generic function with 1 method)

julia> observables = ( x->x[1], x->x[2], x->x[1]+x[2] ) # tuple!
(var"#67#70"(), var"#68#71"(), var"#69#72"())

julia> @time sim(observables; max_iters=20)
  0.000026 seconds (1 allocation: 80 bytes)
3-element Vector{Float64}:
  9.716143022312487
  8.855024254148601
 18.571167276461082

I am using a static vector x here to avoid allocations and drive the point home.

Depends on how you do it. Keyword function barrier :slight_smile:

Bad:

function sim(observables::Vector)
   # type not inferable at compile time, because
   # convert_to_tuple cannot be type stable.
   obs_tuple = convert_to_tuple(observables) 
   
   for ob in obs_tuple
     # here the type of ob is not know!
   [...]

Okay:

Julia’s compiler specializes code for argument types at function boundaries,

function sim(observables::Vector)
   obs_tuple = convert_to_tuple(observables) 
   
   # specializes on the type of obs_tuple
   actual_simulation(obs_tuple)  
   [...]

It’s still dynamic dispatch, because Julia has to decide on which method to dispatch on at runtime, or whether to compile a new actual_simulation method. But this is a one-time overhead and should be negligible compared to the actual work.

3 Likes

Thank you @skleinbo !

I timed the little example to function barriers on the link you posted, and the difference can easily be orders of magnitudes of runtime. Amazing!

I don’t fully understand the problem though…

Your first case:

function sim(observables::Vector)
   # type not inferable at compile time, because
   # convert_to_tuple cannot be type stable.
   obs_tuple = convert_to_tuple(observables) 
   
   for ob in obs_tuple
     # here the type of ob is not know!
   [...]

Apparently, compilation is invoked when the function sim is called somewhere with a vector observables that will have a concrete type, eg. Float64.
But why is convert_to_tuple not type-stable? If I enter a vector of type Float64, the resulting tuple should have Float64 as well.

How can code even be translated if the compiler cannot know the types?

Case 2:

function sim(observables::Vector)
   obs_tuple = convert_to_tuple(observables) 
   
   # specializes on the type of obs_tuple
   actual_simulation(obs_tuple)  
   [...]

What is the order of compilation here?
We hit a call of sim(...) somewhere and compile the whole sim function, except for the actual_simulation function which only gets compiled when the execution reaches that point? And since, by that time, the compiler will know the type at hand, the resulting machine code of actual_simulation will be faster than without that function interface?

A tuple encodes the number of elements as well that is why converting from a Vector{Float64} will not work, e.g. (1, 2) |> typeof is Tuple{Int, Int}.

If the compiler doesn’t know a type it can just store it in an Any type where the type information will be known at runtime so the code can branch using that runtime information.

I suspect you have the right idea for the order of compilation but I don’t really know.

1 Like

Yes, if there is no method compiled yet for the signature given, a new method will be compiled. And if we were talking about concrete types like Vector{Float64} as you propose here, there would be no problem. However, the original problem is dealing with an abstract type Vector{Function}. There is nothing in the type (which is the only information available at compile time!) that tells the compiler anything about the elements (except that they are function) – in particular not their return type. So it has to prepare for all eventualities so to speak, to deliver runnable code.

A tuple on the other hand, has those information, because a tuple type carries the types of its elements as parameters, e.g. (1, 2.0) isa Tuple{Int, Float64}, or (sin, cos) isa Tuple{typeof(sin), typeof(cos)}. Hence when you pass a tuple of functions, the compiler knows what its dealing with and can make further inference based on that, ideally leading to fast machine code.

Those are also the reasons why a convert_to_tuple cannot be type stable if you feed it a vector of functions: It doesn’t know what functions are in the vector, hence it doesn’t know the type of tuple that will be returned. Even a convert_to_tuple applied to Vector{Float64} won’t be type stable, because how many elements is the resulting tuple going to have…?

By doing dynamic dispatch, i.e. selecting the appropriate method when the respective call is reached. It’s often convenient and perfectly fine, but a total performance killer when it happens in the inner/hot part of your code.

Exactly this. actual_simulation is dynamically dispatched on, but because it presumably makes up for the lion’s share of the runtime, it’s of no importance, and because now the input type is know, a specialized method can be compiled.

2 Likes

Thanks so much!

Another summary to see whether I got it:

function sim(observables::Vector)
   # type not inferable at compile time, because
   # convert_to_tuple cannot be type stable.
   obs_tuple = convert_to_tuple(observables) 
   
   for ob in obs_tuple
     # here the type of ob is not know!
   [...]

When runtime hits sim it compiles the whole function body. Since obs_tuple is of unknown type at runtime, the resulting code might be highly inefficient as the compiler needs to take care of all eventualities. Could it, in principle, re-engage compilation within the function body, e.g. after the obs_tuple line? Because that could mitigate the problem somewhat.

Case 2:

function sim(observables::Vector)
   obs_tuple = convert_to_tuple(observables) 
   
   # specializes on the type of obs_tuple
   actual_simulation(obs_tuple)  
   [...]

Here, compilation of actual_simulation only takes place after obs_tuple, i.e. at a time where the concrete type is known. Dynamical dispatch is used, which means the compiler loads the compiled method for the data type at hand or creates a new method if that data type is not covered already.

On another note, can the convet_to_tuple function be assumed to lead to negligible overhead compared to the benefits one gains from using the tuple + function barrier trick?
I just realize, if I don’t need to change the entries of the observables vector, I can just start with a tuple from the get go, no?

The actual process is more complicated than this in practice. You can look pretty deeply into it using SnoopCompile.jl and Cthulhu.jl.

Could it, in principle, re-engage compilation within the function body, e.g. after the obs_tuple line? Because that could mitigate the problem somewhat.

No - the unit of compilation is the function. You need a function barrier to fix type stability.

can the convet_to_tuple function be assumed to lead to negligible overhead compared to the benefits one gains from using the tuple + function barrier trick

This all depends on how long the rest of the work takes. If the other function is trivial, then convert_to_tuple will have a relatively high cost. If it takes a second then there is essentially no cost at all for running convert_to_tuple - it will be lost in the noise of the main function.

2 Likes

Sure! And, if you define dispatches

convert_to_tuple(x::Vector) = (...)
convert_to_tuple(x::Tuple) = x

your code will work with both :slight_smile:

2 Likes

Thank you guys, I learned a lot from you!

1 Like