Updating a function in a type, where the function is a parameter

I just ran into a problem with my idea to have performant code and updating a struct.

Assume we have a struct (already mutable to change it)

mutable struct MyStruct{F}
    f::F
end

where f is a function (in my application: A cost function f(x)).
I parametrise the function to avoid having a Function field and losing performance (I hope that is the right thought here?)

Now if I want to update said function something like

function update_myself!(m::MyStruct, new_f)
    m.f = new_f
   return m
end

That would not work since the new function can not be converted in the type of the old function. How can I best achieve such a situation without having an abstract type in my struct?

The main problem here is, that the application struct has for example 8 fields, 7 of which should stay unchanged and subtypes where I do not want to reimplement a copy.

Edit: To state the simples example assume the following GradientProblem

where I aim to have function update_cost! and/or update_gradient! to update both function handles.

You cannot, as each function has its own distinct concrete type.

IIUC, you can’t. The thing with concrete type annotations is that you get exactly this tradeoff: the compiler gets to know the type and can assume that it won’t change midway, so it get’s to generate code without runtime type checks. If your goal is precisely to change the function and thus the type of your field, then the compiler can not assume that it stays fixed and basically has to resolve everything at runtime, aka no optimization possible.

Possible solutions that I see:

  • If your cost functions share a common structure and only differ in some of the parameters that are used, you can switch to storing the parameters that determine the cost function instead and have the calculation itself fixed.
  • If that’s not possible, you can try to contain the instability in your code as much as possible. Basically try to separate the fast bits from the instable part and annotate the resulting type of the cost function call if you can. This should ensure that everything that can be stable stays so. Still not good if you want to call the cost function in a hot loop I guess.
  • If so, the only good solution that I can come up with is to basically propagate the function type so far up that everything that need to be in a hot loop is aware of the function type at any moment and updating the cost function would then have to happen outside of this scope. This should then compile a fast version of the code for every concrete cost function, but you obviously pay i compile time for that. You kinda have to decide/try what works best for you, or somebody else comes up with a clever solution. Good luck anyways
3 Likes

I think the best approach will depend on where you need the specialization. If the function type is changing within hot loops, having the field typed won’t help much, even if you copy the structure.

3 Likes

I already do have a separate get_gost(MyStruct, point) function, is that already encapsulating enough? Then I would switch my f field to just be a Function?

The problem appeared when writing optimisation solvers that have sub solvers (and MyStruct describes such a subproblem), where the cost function changes every outer iteration (before calling the subsolver) and the sub solver is provided by the user (i.e. I want to change only the function f while keeping the favourite settings in MyStruct).

I am not against creating a copy per se (since the data should not change those are links anyways) - just that that would also be much more code for each specialisation at hand (compared to the 4 lines of an update function above)

Maybe if the expensive part occurs inside that function you need to unpack the structure on that call.

Something like get_gost(mystruct.f, mystruct.a, mystruct.b,...)

Or some variation of that, so the function can specialize for everything.

Yes, but maybe that is the best approach. And if the function is the only field that changes, you can then make the structure immutable, which have its own advantages.

  • If your cost functions share a common structure and only differ in some of the parameters that are used, you can switch to storing the parameters that determine the cost function instead and have the calculation itself fixed.

Considering the common structure, my cost function f is really just a simple function, no parameters to store somewhere.

I’d recommend wrapping the 7 fields you don’t want to change into their own struct, i.e. create this duality:

struct DataWrapper
   # fields..
end

mutable struct CostFunc
    f::Function
    data::DataWrapper
end 

This will allow the pattern suggested by @lmiq to work more easily, as there are only two parameter to pass to your inner (& specializing) function. Be aware that this will trigger recompilation for that inner function for each new function, which may incur a significant overhead.

1 Like

Can you elaborate on this answer? I do not understand what you mean?

So the expensive things might be

a) evaluating f itself
b) things happening in the loop outside of get_cost.

With get_costI currently have a neat encapsulation of calling the cost function which I would like to keep as an abstraction level, if I would also unpack that – that would be a lot of code to write (also same reason I would prefer just an update function and not new constructions for even MyStruct subtype that I have).

That looks nice, but would basically affect 60% of my code base I fear.
Would the opposite be doable, do encapsulate f?

That is more a matter of amount of code to write, tests to write for this code and maintaining said code. It would mean to implement slightly lengthy copy-functions for 20+ structs.

Okay to proceed, let me ask this question: where will 99% of the runtime be spent? Inside the subsolvers that need to run the cost function often, or outside somewhere and the subsolvers and the cost function only account for a small part of the problem?

That depends on the application, but in general I would assume the iterates in the subsolver will take most time - so maybe a slight preference to the first: Inside the sub solver .

As I mentioned above, accessing a non-static function that’s saved in a struct is inherently dynamic. If you wrap the function, you might as well just keep it as Function - there is no advantage to be had here, as long as you seperate the field access from the application of that cost function in a hot loop. You really don’t want to have a type unstable field access in your hot loop. The code I posted above just makes that easier, because it separates the unstable field from the stable ones, by wrapping the stable ones in a struct to make the call to your hot loop smaller. Think hot_func(struct.f, struct.data) vs. hot_func(struct.f, struct.a, struct.b, struct.c, struct.d, ...) or even hot_func(struct) with field access inside of hot_func (which is worst for performance for you).

This technique is called a “function barrier”, which allows the compiler to specialize hot_func based on the function you pass in. It’s just not possible to do that when you have the field access inside of hot_func itself. Since you want to have the function change dynamically, you have to decide where to place the type instability & how to best deal with it.

1 Like

Oh I am saying nothing against your code, it looks like a cool solution. Just that my idea of “Let‘s spend a Sunday morning on rewriting this” would turn into a month-long project (and I am still in such a thing from an Idea I had during Christmas in another project - where it will pay off tremendously). That‘s my only reason to hesitate.

To extend my answer, I think it would even be a full rewrite, since besides cost this would also affect gradient and/or hessian. That is sad, because then I do not have a good idea how to solve my problem in my current code (without a complete rewrite). I added link to the actual struct considered (GradientProblem at Manopt.jl/gradient_plan.jl at 582e13e3253edf1354f69196211e8c32230a3133 · JuliaManifolds/Manopt.jl · GitHub ), where that one only has M (a manifold) as further data that is accessed regularly - and for best of cases (other Problems). not much more is stored (that is dynamic data accessed every iteration is stored in a different – Options – struct).

Maybe offtopic, but one common pattern in packages that provide solvers is to just receive the functions as parameters, like

solver(f, SolverParameters()) 

and the user sets the parameters of the function with a closure,

f(x,a) = a*x^2
solver( x -> f(x,a), SolverParameters())

Then your package does not have to deal at all with the parameters of the function.

Concerning the optimization of the code: if the cost function is very expensive, then probably all the rest does not matter. If it is cheap, then it will be important for the inner function to specialize to it, because it may be even inlined by the compiler.

2 Likes

Thanks for the off-topic route :slight_smile:

Two small comments on that:
In general I agree that writing a solve function to accept the cost f is the right way to do. I actually do that in the high-level interfaces (gradient_descent), the internal structure is more from my previous works and experiences (for example there are similar packages I used and know in Matlab/Python). For example my Options-structure is what you called SolverParameters :slight_smile: (and from today I would maybe put the manifold M into the options as well as the domain of optimisation)

From Today I would have maybe put the manifold not into my problem structure, which also brings me to my second comment and the question:
Instead of just f I would maybe also encapsulate a gradient and a hessian into the first parameter, right?

Then, redesigning my Problem to just store functions (but not the manifold) might be the way to go. The problem would then contain not more than 2 or 3 fields/functions (f, gradient, hessian, for example) and either typed (with an easy copy) or untyped (and replaceable) would both be possible.

1 Like

I would use multiple dispatch on the number of Function parameters given, if two are given, assume that the gradient is given, if 3, gradient and Hessian.

I am not much in favour of that since two can also mean (I am doing non smooth optimisation)

  • a function f and its subgradient (a function returning an element from the subgradient in case the function is non differentiable at a point)
  • a function and its proximal map

Also having a clean interface with just 2 or 3 parameters is what I have currently and I like that actually (but that is of course personal taste as lone as its that few parameters)

1 Like