There are plenty of methods to solve a minimization problem. Many of them iteratively solves for the parameters by decrementing them by a fraction (determined by a learning rate) of the partial derivatives of the objective function. Why not just solve an ODE instead of inventing optimizers? What can go wrong with this? MWE is below.

```
using LinearAlgebra, OrdinaryDiffEq, Random
Random.seed!(1234)
# Solve U for A*U = B, that is argmin (1/2) (A*U - B)'(A*U-B) {+(1/2)U'U for minimum norm}
A = rand(50, 5)
U = rand(5, 2) # actual solution
B = A*U
AtA = A'*A
AtB = A'*B
p = (.-AtA, AtB)
function st!(du, u, p, t)
AtA, AtB = p
du .= AtB # .- u for minimum norm solution
mul!(du, AtA, u, 1.0, 1.0)
return nothing
end
U0 = rand(size(U)...)
tspan = (0.0, 100.0)
prb = ODEProblem(st!, U0, tspan, p)
sol = solve(prb, Tsit5(), dense=false, save_everystep=false, dt=1e-5)
sum(abs2, B .- A*sol.u[end])
```

It gets a crude solution but this may not be best example. Advantages, it has adaptive stepping, checks instability, uses higher order approximation and even we can use it to find steady-state solution. Disadvantages, 7-8 (I think) function evaluations per step but has 5th(I think) order approximation (by just function evaluations) while familiar optimizers at most goes up to 2nd order(Hessians). What are your thoughts? Is there any research on this?