Root finding / NonLinear problem

Question:
I am trying to solve a root-finding problem on a vector of 14 elements, but each function evaluation needs about 7 seconds.

What is the best solver for this?

And is there a way to declare a valid range of the parameters of my function?
Not all parameter combinations give a feasible results. How to best signal that
to the solver?
My current code (not really working, at least not or only very slowly converging):

    u0 = KiteControllers.load_corr()
    prob = NonlinearProblem(residual, u0)
    sol = solve(prob, RobustMultiNewton(; autodiff = AutoFiniteDiff()); 
                abstol=1.0, reltol=0.05, maxiters=20, verbose=true)

And maxiters is not respected.

This works, but it feels like writing my own solver:

    initial = KiteControllers.load_corr()
    last_norm=1000
    for i in 1:40
        last_initial = deepcopy(initial)
        res = residual(initial)
        println("i: $(i), norm: $(norm(res))")
        common_size=min(length(initial), length(res))
        for i=1:common_size
            if norm(res) > 5
                initial[i] += res[i]
            elseif norm(res) > 2
                initial[i] += 0.3*res[i]
            else
                initial[i] += 0.15*res[i]
            end
        end
        if norm(res) < 1.0
            println("Converged successfully!")
            break
        end
        if last_norm < 0.7*norm(res) 
            KiteControllers.save_corr(last_initial)
            println("Convergance failed!")
            println("Last norm: $last_norm")
            break
        end
        last_norm=norm(res)
    end
    initial

In 40 iterations I get a solution with norm(res) < 1.0.

Is your function differentiable?

Well, the function runs a flight simulation, the return value is the deviation of the desired flight path at 14 points… Furthermore, the return value might contain some noise…

So probably not easily differentiable, even though mostly it is true that increasing the input also increases the output… But there is some non-linear coupling between the channels (elements of the vectors) …

I’m asking because when you use FiniteDiff as the autodiff backend, each gradient evaluation costs 14 function calls. Whereas if you are able to use a reverse mod autodiff backend like Enzyme, it will go down to 1

What do you mean? It’s respected within each solve of the multi-Newton.

Doing a transformation to re-parameterize that to the real line is usually helpful.

What do you mean by noise here? That’s a big red flag for a Newton method.

No, reverse mode does not help to compute a square Jacobian.

1 Like

My bad, I thought we were looking for the zero of a scalar-valued function

Of cause automatic differentiation is not possible if you are investigating how a hybrid system (continues time system, controlled by a digital controller) reacts to changed control settings…

No, our automatic differentiation works on hybrid systems, both forward and reverse. See for example:

We wrote some details on this as well:

That’s for reverse mode. Forward mode AD of hybrid systems worked of course since 2017 since dosing is a classic example of that an Pumas was built around forward mode AD.

Well, maybe that works if you use ModellingToolkit, but I am not using it for this project…

This is what we are talking about: KiteControllers.jl/examples/learning.jl at main · aenarete/KiteControllers.jl · GitHub

The examples I shared do not use ModelingToolkit. Any ODE where f, condition, and affect! are differentiable should work.

When you try ForwardDiff, what do you see?

But this cannot work… I am not working on an ODE… I run a simulation, save the result as log file and then extract from the log file the position of curtain points of the flight path…

Just to give you an idea how the uncorrected flight path looks like:

Pseudocode:

log=[]
for t in 0:dt:t_end
    KiteModels.next_step!(kps4, integrator, v_ro=v_ro, dt=dt)
    sys_state = KiteModels.SysState(kps4)
    push!(log, sys_state)
    # run the discrete time controller
    change_steering!(kps4)      
end
# extract the deviation from the desired flight path
residual = observe(log)

I am not using any callbacks…

Using the residual, I want to correct the parameters of the discrete time controller.

IIUC you’re trying to minimize the norm of the residual. You could use Nomad.jl with the norm of the residual as the objective function. Nomad allows you to tell it when it generates an infeasible point.

I have this code now:

function train(; max_iter=40, norm_tol=1.0)
    local corr_vec
    try
        log = load_log("uncorrected")
        ob = KiteObserver()
        observe!(ob, log)
        corr_vec=ob.corr_vec
    catch
        corr_vec=residual()
    end
    KiteControllers.save_corr(corr_vec)
    initial = KiteControllers.load_corr()
    last_norm=1000
    best_corr_vec = deepcopy(initial)
    best_norm = norm(best_corr_vec)
    j = 0
    for i in 1:max_iter
        res = residual(initial)
        println("i: $(i), norm: $(norm(res))")
        common_size=min(length(initial), length(res))
        for i = 1:common_size
            if best_norm > 5
                initial[i] += res[i]
            elseif best_norm > 2
                initial[i] += 0.5*res[i]
            else
                initial[i] += 0.25*best_norm*res[i]
            end
        end
        if norm(res) < norm_tol
            println("Converged successfully!")
            break
        end
        if best_norm > norm(res)
            best_norm = norm(res)
            best_corr_vec = deepcopy(initial)
            j = 0
            println("j: $(j), best_norm= $best_norm")
        else
            j+=1
            println("j: $j")
        end
        if j > 4
            println("Convergance failed!")
            println("Best norm: $best_norm")
            break
        end
        last_norm=norm(res)
    end
    KiteControllers.save_corr(best_corr_vec)
    best_corr_vec
end

It works pretty well, the norm of the error goes down to less then one degree in less than 20 iterations…

I think there is no chance to get such a fast convergence with any generic solver. I think I am exploiting here knowledge about the system that generic solvers don’t have. Limitation: I cannot achieve an error of less than 0.88 degrees… But that is good enough for my use case.

I still need to find out how robust this approach is with respect to changes of the model, e.g. changed wind speed or changed size of the kite. I will report back as soon as I found out…