Struggling to get scipy.newton performance using NonlinearSolve.jl

Hi everyone! I’m relatively new to Julia (1.7.2), coming from a Matlab and Python background. I really enjoy the language, but I also struggle sometimes in cases where in Matlab / numpy I would simply use vectorization and it would be relatively fast.

I’m playing around with a nonlinear equation, the dispersion relation for ocean waves, which I’m trying to solve for k:
\omega^2 = gk \cdot tanh(kd)

In the formulation of my problem, \omega and k are vectors of length N and d is a scalar. I am using NonlinearSolve and setup the problem such that constant parameters in the optimization are provided in the NamedTuple p:

using NonlinearSolve

function dispersion(k::AbstractVector{<:Real}, p::NamedTuple)
    @. k * p.G * tanh(k * p.d) - p.ω^2

# Setup parameters
N = 1000
G, d, ω = 9.8, 30.0, 2π * collect(range(0.05, 1, N))
p = (G=G, d=d, ω=ω)

# Starting point: first order approximation
k0 = @. ω^2 / G / tanh(ω * sqrt(d / G))
prob = NonlinearProblem{false}(dispersion, k0, p);
solver = solve(prob, NewtonRaphson(), tol=1e-9)

Doing some benchmarks against my original python implementation using scipy.newton shows that the julia version above dramatically underperforms, especially as N increases:


I played around with the tolerances as well as the Optim.jl package (which gave much worse results with different solvers). From the scipy docs I know that newton defaults to the secant method. I have not found a method yet that is faster than the NonlinearSolve approach above, but I’m also quite new to Julia, so I don’t know the entire landscape of solver packages yet :).

Does anybody know what causes the poor performance, or what solver package I should consider instead? Any thoughts are much appreciated!

It is also used to find roots of scalar functions (in contrast to scipy.optimize.root). While your Julia example solves a system of N equations. Is that what you want? Difficult to tell what’s going on without your Python implementation to compare against.

Perhaps you are looking for something like this?

using Roots

N = 10^3
G, d, ω = 9.8, 30.0, 2π .* range(0.05, 1, N)  # don't collect the range!!
k0 = @. ω^2 / G / tanh(ω * sqrt(d / G))

dispersion(k, G, d, ω) = k * G * tanh(k * d) - ω^2  # no dots, just a scalar function
disproots(G, d, ω, k0) = find_zero(k->dispersion(k, G, d, ω), k0)


1.7.2> @benchmark disproots.($G, $d, $ω, $k0)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  279.100 μs …  1.387 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     295.400 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   310.837 μs ± 51.768 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆██▆▇▅▅▄▃▃▃▂▂▂▂▂▁▁▁                                         ▂
  ████████████████████████▇▇██▆▇▇▇█▆▇▇▇█▇▇▇▆▇▇▇▅▇▅▅▅▅▅▄▆▅▆▆▅▆▅ █
  279 μs        Histogram: log(frequency) by time       536 μs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

Seems to be approx 50-100x faster than your code at N = 1000, and runtime scales linearly with N, while your code seems to scale like N^2.

It seems to solve a somewhat different problem, as @skleinbo says, you are solving a system of N equations, that just happen to be completely decoupled, so the results are equal up to ~10^-16.

Yes, if it’s a bunch of scalar rootfinds, you should use a scalar rootfinder n times instead of a vector Newton method since the latter will require building and factorizing a large Jacobian. With that in mind, here’s the NonlinearSolve code:

using NonlinearSolve

function dispersion(k, p)
    k * p.G * tanh(k * p.d) - p.ω^2

# Setup parameters
N = 1000
G, d, ω = 9.8, 30.0, 2π * collect(range(0.05, 1, N))
p = map(ω -> (G=G, d=d, ω=ω), ω)

# Starting point: first order approximation
k0 = @. ω^2 / G / tanh(ω * sqrt(d / G))
prob = NonlinearProblem{false}.(dispersion, k0, p)
solver = solve.(prob, (NewtonRaphson(),), tol=1e-9)

which as expected is really fast:

using BenchmarkTools
julia> @benchmark solver = solve.(prob, (NewtonRaphson(),), tol=1e-9)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  128.400 μs …   5.394 ms  ┊ GC (min … max): 0.00% … 96.94%
 Time  (median):     144.500 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   152.696 μs ± 125.333 μs  ┊ GC (mean ± σ):  2.62% ±  3.17%

   ▅▆ ▁ ▃█ ▃
  ▁██▆█▃█████▆▄▃▂▂▂▁▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  128 μs           Histogram: frequency by time          262 μs <

 Memory estimate: 78.69 KiB, allocs estimate: 18.

150 μs ain’t too shabby.


I should indeed have mentioned that I’m looking for a bunch of scalar rootfinds. And in python, my code looks like this (works with k0 and omega as vectors):

k = scipy.newton(dispersion, k0, args=(d, omega))

@ChrisRackauckas dotting the solve is indeed the proper approach, and in retrospect rather obvious - my narrow vectorized Matlab/python mind could not come up with that yet. It is twice as fast as the scipy.newton method, so I’m happy! :slight_smile:

Thanks all for the quick response and helping me getting started with this amazing language!

1 Like