optim arguments: higher dimensional arrays and gradient specification

egcjm · January 21, 2018, 1:02am

Two questions regarding optim,jl:

Gradient specification for higher dimensional arrays. Optim.jl documentation states that “for zeroth and first order methods, it is also possible to pass in matrices, or even higher dimensional arrays.” When specifying such input, should the gradient have the same shape or be specified as a vector corresponding to a flattened version of the input?
Other input types. Is it possible to go beyond higher dimensional arrays, and use input variables that are arrays of arrays, e.g. Array{Array{Float64,2},3} ?

pkofod · January 21, 2018, 7:06am

We should maybe be more explicit, but you can do either. Show me what you want to do and I’ll help (but provide complete functions and initial allocations of x and G so I can run your code). Edit: but by default, the gradient type will be assumed the same type, shape and size as x.

Not yet, no.

Edit: maybe you can find inspiration in the tests here https://github.com/JuliaNLSolvers/Optim.jl/blob/7b4f8cb8021fe1ed0a2befc3789a2c60ee3c85e6/test/multivariate/array.jl

pkofod · January 21, 2018, 10:42am

Alright, look below. Keep in mind that this is sort of advanced usage of Optim so you need to learn about the syntax to construct a OnceDifferentiable (see https://github.com/JuliaNLSolvers/NLSolversBase.jl/blob/9a5650177851b034d13a909a684d091fe6527b03/README.md
for more information).

Below, we use vectors for xs and matrices for gradients. You could have done it the other way around, or even some other abstractarray (given a sufficient set of functions have methods for your type).

# Define objective
f(X) = (10 - X[1])^2 + (0 - X[2])^2 + (0 - X[3])^2 + (5 - X[4])^2
# Define gradient
function g!(storage, x)
    storage[1] = -20 + 2 * x[1]
    storage[2] = 2 * x[2]
    storage[3] = 2 * x[3]
    storage[4] = -10 + 2 * x[4]
    return
end
# A typical x (also the starting value used below)
x = [1.0, 0.0, 1.0, 0.0]
# An instance of the output type of f(x)
fx = 0.0
# A gradient type to hold the gradient, note this is different from x
G = zeros(2,2)
# Create a OnceDifferentiable instnace
od = OnceDifferentiable(f, g!, x, fx, G)
# Evaluate at some point and see what's in the gradient
some_x = [2.0, 3.0, 1.0, 0.1]
NLSolversBase.value_gradient!(od, some_x)
gradient(od)
# Optimize and check the gradient again
res = optimize(od, x, BFGS())
gradient(od)

egcjm · January 21, 2018, 1:54pm

Thank you very much. This is extremely helpful. Code for my objective/gradient functions are lengthy and complicated, hence posting fully functional code would obfuscate matters.

The input to my objective and gradient functions consists of a collection of differently sized matrices (many thousands of scalar variables in total). From your response to the second part of my question, I figure that for now the only way to handle this is to vectorize and stitch them together when calling optim, then do the reverse in my objective function. I figure the oncedifferentiable functionality does not provide a way out of that. Correct?

Thanks again for the help and examples.

ChrisRackauckas · January 21, 2018, 2:43pm

If Optim.jl allows AbstractArrays and internally broadcasts (does it @pkofod?) then a RecursiveArrayTools.ArrayPartition was designed for this. If not, yeah separating out the array using views is the way to go. I show the difference between the two with DifferentialEquations.jl in a blog post here:

but the same idea applies with any package using an array input. In v0.7 views won’t matter as much though, but it will still be a bit of a pain to do this manually IMO so hopefully enough of the Julia ecosystem can be setup with AbstractArray+broadcast that we can start using fancier tools.

pkofod · January 21, 2018, 8:13pm

Mostly, but I can identify a small issue in NLSolversBase, and all the type annotations in LineSearches have to be loosened (thought they were abstractarray tbh). So it should be a very small change to support ArrayPartitions. I’ll try to do it soon, as we’ve had the request before.

pkofod · January 21, 2018, 9:58pm

Not quite there, but it will be possible “soon”. With few local changes and Move to AbstractArray. by pkofod · Pull Request #78 · JuliaNLSolvers/LineSearches.jl · GitHub and Broadcast for x_of_nans by pkofod · Pull Request #37 · JuliaNLSolvers/NLSolversBase.jl · GitHub I get

julia> using Optim, RecursiveArrayTools

julia> f(X) = (10 - X[1])^2 + (0 - X[2])^2 + (0 - X[3])^2 + (5 - X[4])^2
f (generic function with 1 method)

julia> function g!(storage, x)
           storage[1] = -20 + 2 * x[1]
           storage[2] = 2 * x[2]
           storage[3] = 2 * x[3]
           storage[4] = -10 + 2 * x[4]
           return
       end
g! (generic function with 1 method)

julia> ap = ArrayPartition([1.0, 0.0], [1.0, 0.0])
([1.0, 0.0], [1.0, 0.0])

julia> res = optimize(f, g!, ap, BFGS())
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [1.0,0.0,1.0,0.0]
 * Minimizer: [10.0,0.0,0.0,5.0]
 * Minimum: 0.000000e+00
 * Iterations: 1
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 9.00e+00 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = Inf |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 0.00e+00 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 3
 * Gradient Calls: 3

egcjm · January 21, 2018, 11:58pm

thank you @pkofod. optim’s functionality is very impressive. your efforts are really appreciated.

pkofod · January 26, 2018, 8:52am

A look into the dungeon. I’ll tag this change as soon as I iron out some corner cases.

julia> using Optim, RecursiveArrayTools

julia> function polynomial(x)
           return (10.0 - x[1])^2 + (7.0 - x[2])^4 + (108.0 - x[3])^4
       end
polynomial (generic function with 1 method)

julia> function polynomial_gradient!(storage, x)
           storage[1] = -2.0 * (10.0 - x[1])
           storage[2] = -4.0 * (7.0 - x[2])^3
           storage[3] = -4.0 * (108.0 - x[3])^3
       end
polynomial_gradient! (generic function with 1 method)

julia> function polynomial_hessian!(storage, x)
           storage[1, 1] = 2.0
           storage[1, 2] = 0.0
           storage[1, 3] = 0.0
           storage[2, 1] = 0.0
           storage[2, 2] = 12.0 * (7.0 - x[2])^2
           storage[2, 3] = 0.0
           storage[3, 1] = 0.0
           storage[3, 2] = 0.0
           storage[3, 3] = 12.0 * (108.0 - x[3])^2
       end
polynomial_hessian! (generic function with 1 method)

julia> ap = ArrayPartition(rand(1), rand(2))
([0.926164], [0.821162, 0.374145])

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, ConjugateGradient())
Results of Optimization Algorithm
 * Algorithm: Conjugate Gradient
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [9.999999997479323,7.000857916457364, ...]
 * Minimum: 1.932692e-12
 * Iterations: 30
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 7.67e-04 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 6.94e+00 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 5.12e-09 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 62
 * Gradient Calls: 34

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, GradientDescent())
Results of Optimization Algorithm
 * Algorithm: Gradient Descent
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [10.00000456963858,7.010984469414458, ...]
 * Minimum: 2.920265e-08
 * Iterations: 1000
 * Convergence: false
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 7.66e-06 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 2.01e-03 |f(x)|
   * |g(x)| ≤ 1.0e-08: false 
     |g(x)| = 9.14e-06 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: true
 * Objective Calls: 2510
 * Gradient Calls: 2510

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, BFGS())
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [9.999999997694495,6.999913965528232, ...]
 * Minimum: 1.290248e-16
 * Iterations: 39
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 1.55e-05 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 3.07e+00 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 4.61e-09 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 134
 * Gradient Calls: 134

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, LBFGS())
WARNING: Method definition twoloop!(AbstractArray{T, 1} where T, AbstractArray{T, 1} where T, AbstractArray{T, 1} where T, AbstractArray{T, 2} where T, AbstractArray{T, 2} where T, Integer, Integer, AbstractArray{T, 1} where T, AbstractArray{T, 1} where T, Bool, Any, Any) in module Optim at /home/pkm/.julia/v0.6/Optim/src/multivariate/solvers/first_order/l_bfgs.jl:22 overwritten at /home/pkm/.julia/v0.6/Optim/src/multivariate/solvers/first_order/l_bfgs.jl:22.
Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [9.99999999997918,7.000054968318689, ...]
 * Minimum: 1.508989e-17
 * Iterations: 32
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 1.75e-06 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 9.89e+01 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 4.16e-11 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 112
 * Gradient Calls: 112

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, Newton())
Results of Optimization Algorithm
 * Algorithm: Newton's Method
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [9.999999999999996,6.9999829381032095, ...]
 * Minimum: 7.801052e-15
 * Iterations: 2
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 1.81e+01 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 4.22e+16 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 1.05e-10 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 7
 * Gradient Calls: 7
 * Hessian Calls: 2

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, NewtonTrustRegion())
Results of Optimization Algorithm
 * Algorithm: Newton's Method (Trust Region)
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [10.0,6.999891492857457, ...]
 * Minimum: 3.277275e-12
 * Iterations: 32
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 6.73e-04 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 4.06e+00 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 9.74e-09 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 33
 * Gradient Calls: 33
 * Hessian Calls: 32

ChrisRackauckas · January 26, 2018, 10:23am

Wow, if you got that working… test with some GPUArrays now?

pkofod · January 26, 2018, 2:31pm

OpenCL is not nice to me… maybe another day!

anyway: https://github.com/JuliaNLSolvers/Optim.jl/pull/520

Topic		Replies	Views
Does Optim not support function optimization with matrix variables? General Usage question , optim , optimization	14	2440	October 10, 2020
Optim.jl OnceDifferentiable Baby Steps Optimization (Mathematical)	19	3940	April 8, 2018
Conjugate Gradient optimization with set of functions Optimization (Mathematical) optim , optimization , function	9	1302	June 27, 2020
Calculation of gradient not working for an ODEproblem that uses 2D or 3D arrays General Usage diffeq , flux	1	509	July 24, 2020
Optim.jl single iteration Optimization (Mathematical) question , optim	4	562	March 28, 2022

optim arguments: higher dimensional arrays and gradient specification

Related topics