NLSolve performance 32 vs 64 bits

bdc · December 9, 2020, 11:40am

Hi,

For an application with NLsolve.jl I ran into an OutOfMemoryError(), so I decided to try using single instead of double precision in order to reduce the memory footprint (approach similar to machine learning on GPU’s). However, this appears to be slower and make a lot more allocations than the 64-bit version. Is it possible that some internals on NLsolve.jl only use double precision which in turn leads to a lot of type conversions? Or maybe its some other issue like type assertions that tanks the performance?

Below there is a MWE that tries this for the example on the NLsolve homepage.

using NLsolve
using BenchmarkTools

function f!(F, x)
    F[1] = (x[1]+3)*(x[2]^3-7)+18
    F[2] = sin(x[2]*exp(x[1])-1)
    return F
end

let
    x0 = [ 0.1; 1.2]
    F = similar(x0)
    @assert isequal(typeof(f!(F, x0)), Array{Float64,1})
    res = nlsolve(f!, x0)
    @info "64-bits: $(res.f_calls) f_calls, $(res.g_calls) g_calls"
    display(@benchmark nlsolve($f!, $x0))
end

let
    x0 = Float32.([ 0.1; 1.2])
    F = similar(x0)
    @assert isequal(typeof(f!(F, x0)), Array{Float32,1})
    res = nlsolve(f!, x0)
    @info "32-bits: $(res.f_calls) f_calls, $(res.g_calls) g_calls"
    display(@benchmark nlsolve($f!, $x0))
end

and the results:

[ Info: 64-bits: 5 f_calls, 5 g_calls
BenchmarkTools.Trial: 
  memory estimate:  5.45 KiB
  allocs estimate:  66
  --------------
  minimum time:     4.475 μs (0.00% GC)
  median time:      4.638 μs (0.00% GC)
  mean time:        5.269 μs (9.68% GC)
  maximum time:     694.599 μs (98.82% GC)
  --------------
  samples:          10000
  evals/sample:     7
[ Info: 32-bits: 30 f_calls, 4 g_calls
BenchmarkTools.Trial: 
  memory estimate:  746.59 KiB
  allocs estimate:  7971
  --------------
  minimum time:     570.446 μs (0.00% GC)
  median time:      578.320 μs (0.00% GC)
  mean time:        641.458 μs (8.87% GC)
  maximum time:     4.567 ms (86.10% GC)
  --------------
  samples:          7792
  evals/sample:     1

ctkelley · December 9, 2020, 12:18pm

SIAMFANLEquations.jl will let you do single precision linear algebra and double precision everything else. There is an example in the Chapter2s notebook and in the docstrings. Storing the Jacobian in Float32 will cut the linear algebra work by a factor of two and usually cost nothing in nonlinear performance or accuracy of the solution.

Doing this pays off best when the Jacobian is large and dense (1000 x 1000 or more). I’m working on a Newton-GMRES implementation where this may save some work in orthogonalization if the problem is large and you need many Krylovs/Newton.

If you try that, please let me know how it turned out.

Topic		Replies	Views
Worse performance of LU-decomposition for Float32 than for Float64 Performance linearalgebra	7	439	November 5, 2020
DoubleFloats.jl -> Double128 \| Quad64? Numerics	54	4488	December 14, 2020
[ANN] Quadmath.jl Package Announcements	6	1742	May 21, 2019
Getting the benchmarked speedup from Distances.jl General Usage performance	8	545	September 7, 2020
Lux.jl Chain of Dense layers don't see speed up going from 32 bits -> 16 bits Performance machine-learning	5	897	August 7, 2022

NLSolve performance 32 vs 64 bits

Related topics