JuMP: Control automatic differentiation of user defined function

Hi all,

I am currently working on a code that is supposed to fit a model to experimental coherent Anti-Stokes Raman spectra which has about 9 decision variables, depending on the case. Because the computations in this model are very expensive, I precompute the most expensive part, which fortunately only depends on one variable (temperature), in a spectral library and do the rest during the fitting algorithm.
The consequence of this approach is, that temperature is now only available as discrete values - those, that were used when generating the library. For the sake of this example, say one generates a library of spectra ranging from 300 K to 2400 K in 10 K increments.

In a former approach implemented in MATLAB, I used a mixed-integer genetic algorithm for the fitting which worked very nicely and had runtimes in the order of 500-1000 ms per spectrum.
I now wanted to port this to Julia (for reasons…) and I am kind of stuck when trying to get the fit to work…

My first approach was to treat temperature as an integer and using Juniper+Ipopt, which is painfully slow… So I tried to drop the idea of handling temperature as an integer and implemented interpolation between spectra and use Ipopt alone. However, when using nearest neighbor interpolation, the search space obviously flattens out between library entries, such that the derivative approaches zero, introducing local minima.
This only get’s a little better when using linear interpolation. Going to higher order is unfortunately not an option, because of the increased computational cost.

The question now is: is it possible to control the underlying AD (ForwardDiff) in a way that it computes the gradient across a larger interval? In this case, the one defined by the library temperature increment.
Or is there any way to control the minimum difference of a decision variable at two consecutive iterations? This would also help to reduce the effort put into optimizing another (continuous) decision variable to the 8th decimal in the case of a noisy spectrum.

I have to say, I am very new to automatic differentiation and my background is not computational engineering, so maybe I am thinking in the wrong direction here.

Providing a minimal working example is unfortunately not trivial, because this would require to either upload the library (which is in the order of several GB) or the entire code, which is not really minimal I guess…

I did try other approaches, such as Evolutionary.jl, which sort of works, but even in the absence of experimental noise fails to converge to the exact solution in most cases.

I really appreciate any hint in the right direction!

Max

1 Like

Hi @mgreifenstein !
If I understand your question correctly, you may have misinterpreted what ForwardDiff does. It doesn’t use a finite difference formula \frac{f(x+h)-f(x)}{h} for some small value of h. Instead, it executes your code in a clever way by propagating both the value and its derivative through each operation. So I’m not sure what you mean by “computing the gradient across a larger interval”, could you maybe rephrase?
Probably a naive question on my end, but can’t you translate the Matlab code into Julia, if you were satisfied with it?

Hi @gdalle

Thanks for your quick response and your help! I do have to admit, even though I tried reading about how ForwardDiff works, I didn’t fully understand all implications. I will assign this as a homework to me to learn more about that :slight_smile:

But for this practical example: How does ForwardDiff react to a piecewise constant approximated input? I would (again, maybe this is wrong because I think in a finite differences mindset) assume, that for very small pieces, it would closely resemble the gradient of the underlying smooth function and for very large pieces, assume a gradient of 0.
Is there any way to systematically try it out? I guess I could use ForwardDiff directly on the function with different piecewise constant approximations and see what it does.

Or is there any way to print out the gradient of a decision variable during the fit process?

The problem with porting the MATLAB implementation of the mixed integer GA is that it contains proprietary (and partially compiled) code. The reason for using Julia in the first place is that the library generation itself is 2 orders of magnitude faster - and I was hoping to be at least as fast with the fitting as MATLAB.

Best
Max

Here’s a piecewise-constant function to play around with: f(x, a) = \left\lfloor a (x - 0.5) \right\rfloor
Below you can see the very different behaviors of ForwardDiff and FiniteDifferences. While the latter computes derivatives the way you would expect, with a difference between neighboring points, the former only uses the function at the exact point where you call it, because it is able to exploit the shape of the function itself. As a result, the derivative it returns is always zero here, whereas the other package starts seeing the other plateaus when a gets big enough.

julia> using FiniteDifferences, ForwardDiff

julia> f(x, a) = floor((x + 0.5) * a);

julia> for a in 10 .^ (0:2:10)
           d1 = ForwardDiff.derivative(x -> f(x, a), 0)
           d2 = FiniteDifferences.forward_fdm(2, 1)(x -> f(x, a), 0)
           @info "With scale a=$a, ForwardDiff gives f'(0,a)=$d1 and FiniteDifferences gives f'(0,a)=$d2"
       end
[ Info: With scale a=1, ForwardDiff gives f'(0,a)=0.0 and FiniteDifferences gives f'(0,a)=0.0
[ Info: With scale a=100, ForwardDiff gives f'(0,a)=0.0 and FiniteDifferences gives f'(0,a)=0.0
[ Info: With scale a=10000, ForwardDiff gives f'(0,a)=0.0 and FiniteDifferences gives f'(0,a)=0.0
[ Info: With scale a=1000000, ForwardDiff gives f'(0,a)=0.0 and FiniteDifferences gives f'(0,a)=0.0
[ Info: With scale a=100000000, ForwardDiff gives f'(0,a)=0.0 and FiniteDifferences gives f'(0,a)=7.168686400303e7
[ Info: With scale a=10000000000, ForwardDiff gives f'(0,a)=0.0 and FiniteDifferences gives f'(0,a)=9.998010804980501e9
1 Like

It’s hard to offer advice without a reproducible example.

The first decision is whether JuMP is the right tool for the job. If you previously used a genetic algorithm that worked nicely, Juniper+Ipopt might be the wrong choice. See:

You can register your own gradient, Nonlinear Modeling · JuMP, but if it’s discrete it isn’t obvious to me that your problem has a well-defined first-order derivative?

For a next step, I’d try to distill what you are trying to compute into a minimal reproducible example. Don’t include the multi GB code, just come up with a toy problem that has the same characteristics as the problem you’re actually trying to solve. I should be able to run it via a copy-paste into the REPL, so just use random data, don’t read things in from files, etc. Then people might have suggestions for what you could try.