Hi, I’m in an optimization setting where I have to minimize a function f
, where I have a “vector version” and a “matrix version” of the function “f” such that “calling the vector version N times” is typically slower than “calling the matrix version with N rows”. My question is whether I can use this “matrix version” to compute finite differences more efficiently when using Optim.
See the following MWE to convey the main idea: running the vectorized version once costs about 6 ns, while running the matrix version with 100 columns costs 160 ns, which is roughly 25x more expensive while computing 100x more values! It seems to me that a good finite-difference stencil (e.g. for the Hessian) may thus be computed more efficiently using this matrix-version than repeated calls to the vector version. Is this correct, and if so, is there a way to make Optim use this?
(I’ve looked into automatic differentiation for my application, this seems to be roughly on par with finite differences. I can probably squeeze out some improvement there, but this would probably take more man-hours than I can afford at this point).
using BenchmarkTools, LinearAlgebra, Optim, Random
function f(x::Vector{T}, a::Vector{Float64}) where {T<:Real}
f_out = zero(promote_type(T, Float64))
for xi in x
for (i, ai) in enumerate(a)
if i % 2 == 0
f_out *= ai
else
f_out += abs(xi[1])
end
end
end
return f_out
end
function f(x::Matrix{T}, a::Vector{Float64}) where {T<:Real}
f_out = zeros(promote_type(T, Float64), size(x, 1))
for xi in eachcol(x)
for (i, ai) in enumerate(a)
if i % 2 == 0
f_out .*= ai
else
f_out .+= xi
end
end
end
return f_out
end
function run_opts(d::Int64)
Random.seed!(42)
x_init = ones(Float64, d)
a_vec = [1., 2., 3., 4., 5.]
display(f(x_init, a_vec))
x_rand_mat = rand(100, d)
x_rand = x_rand_mat[1, :]
display(@benchmark f($x_rand, $a_vec))
display(@benchmark f($x_rand_mat, $a_vec))
opt = Optim.optimize(x -> f(x, a_vec), x_init, LBFGS())
return opt
end
run_opts(2)
gives
117.0
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 5.916 ns … 33.667 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.000 ns ┊ GC (median): 0.00%
Time (mean ± σ): 6.085 ns ± 0.621 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃█▇▅▃ ▁
█████▆▁▁▅█▆▆▅▃▅▇▆▆▅▅▁▅▄▄▄▄▁▃▃▁▁▁▃▁▄▄▃▃▁▄▃▁▃▅▁▃▁▃▄▁▃▅▆▄▅▄▅▅ █
5.92 ns Histogram: log(frequency) by time 8.46 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 10000 samples with 829 evaluations per sample.
Range (min … max): 153.649 ns … 40.801 μs ┊ GC (min … max): 0.00% … 99.46%
Time (median): 160.887 ns ┊ GC (median): 0.00%
Time (mean ± σ): 174.660 ns ± 408.535 ns ┊ GC (mean ± σ): 4.90% ± 7.39%
█▆▇▄▄▃▃▃▃▁▁ ▂
████████████▇▇▆▆▆▅▅▄▃▄▃▃▁▁▃▁▃▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▃▁▁▁▄▃▅▅▆▆▅▆ █
154 ns Histogram: log(frequency) by time 478 ns <
Memory estimate: 896 bytes, allocs estimate: 1.
* Status: success
* Candidate solution
Final objective value: 2.271107e-20
* Found with
Algorithm: L-BFGS
* Convergence measures
|x - x'| = 3.40e-07 ≰ 0.0e+00
|x - x'|/|x'| = 1.71e+15 ≰ 0.0e+00
|f(x) - f(x')| = 6.94e-06 ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = 3.06e+14 ≰ 0.0e+00
|g(x)| = 0.00e+00 ≤ 1.0e-08
* Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 20
f(x) calls: 127
∇f(x) calls: 127