Hi,
For an application with NLsolve.jl
I ran into an OutOfMemoryError()
, so I decided to try using single instead of double precision in order to reduce the memory footprint (approach similar to machine learning on GPU’s). However, this appears to be slower and make a lot more allocations than the 64-bit version. Is it possible that some internals on NLsolve.jl
only use double precision which in turn leads to a lot of type conversions? Or maybe its some other issue like type assertions that tanks the performance?
Below there is a MWE that tries this for the example on the NLsolve homepage.
using NLsolve
using BenchmarkTools
function f!(F, x)
F[1] = (x[1]+3)*(x[2]^3-7)+18
F[2] = sin(x[2]*exp(x[1])-1)
return F
end
let
x0 = [ 0.1; 1.2]
F = similar(x0)
@assert isequal(typeof(f!(F, x0)), Array{Float64,1})
res = nlsolve(f!, x0)
@info "64-bits: $(res.f_calls) f_calls, $(res.g_calls) g_calls"
display(@benchmark nlsolve($f!, $x0))
end
let
x0 = Float32.([ 0.1; 1.2])
F = similar(x0)
@assert isequal(typeof(f!(F, x0)), Array{Float32,1})
res = nlsolve(f!, x0)
@info "32-bits: $(res.f_calls) f_calls, $(res.g_calls) g_calls"
display(@benchmark nlsolve($f!, $x0))
end
and the results:
[ Info: 64-bits: 5 f_calls, 5 g_calls
BenchmarkTools.Trial:
memory estimate: 5.45 KiB
allocs estimate: 66
--------------
minimum time: 4.475 μs (0.00% GC)
median time: 4.638 μs (0.00% GC)
mean time: 5.269 μs (9.68% GC)
maximum time: 694.599 μs (98.82% GC)
--------------
samples: 10000
evals/sample: 7
[ Info: 32-bits: 30 f_calls, 4 g_calls
BenchmarkTools.Trial:
memory estimate: 746.59 KiB
allocs estimate: 7971
--------------
minimum time: 570.446 μs (0.00% GC)
median time: 578.320 μs (0.00% GC)
mean time: 641.458 μs (8.87% GC)
maximum time: 4.567 ms (86.10% GC)
--------------
samples: 7792
evals/sample: 1