optim arguments: higher dimensional arrays and gradient specification

question
optim
optimization

#1

Two questions regarding optim,jl:

  1. Gradient specification for higher dimensional arrays. Optim.jl documentation states that “for zeroth and first order methods, it is also possible to pass in matrices, or even higher dimensional arrays.” When specifying such input, should the gradient have the same shape or be specified as a vector corresponding to a flattened version of the input?

  2. Other input types. Is it possible to go beyond higher dimensional arrays, and use input variables that are arrays of arrays, e.g. Array{Array{Float64,2},3} ?


#2

We should maybe be more explicit, but you can do either. Show me what you want to do and I’ll help (but provide complete functions and initial allocations of x and G so I can run your code). Edit: but by default, the gradient type will be assumed the same type, shape and size as x.

Not yet, no.

Edit: maybe you can find inspiration in the tests here https://github.com/JuliaNLSolvers/Optim.jl/blob/7b4f8cb8021fe1ed0a2befc3789a2c60ee3c85e6/test/multivariate/array.jl


#3

Alright, look below. Keep in mind that this is sort of advanced usage of Optim so you need to learn about the syntax to construct a OnceDifferentiable (see https://github.com/JuliaNLSolvers/NLSolversBase.jl/blob/9a5650177851b034d13a909a684d091fe6527b03/README.md
for more information).

Below, we use vectors for xs and matrices for gradients. You could have done it the other way around, or even some other abstractarray (given a sufficient set of functions have methods for your type).

# Define objective
f(X) = (10 - X[1])^2 + (0 - X[2])^2 + (0 - X[3])^2 + (5 - X[4])^2
# Define gradient
function g!(storage, x)
    storage[1] = -20 + 2 * x[1]
    storage[2] = 2 * x[2]
    storage[3] = 2 * x[3]
    storage[4] = -10 + 2 * x[4]
    return
end
# A typical x (also the starting value used below)
x = [1.0, 0.0, 1.0, 0.0]
# An instance of the output type of f(x)
fx = 0.0
# A gradient type to hold the gradient, note this is different from x
G = zeros(2,2)
# Create a OnceDifferentiable instnace
od = OnceDifferentiable(f, g!, x, fx, G)
# Evaluate at some point and see what's in the gradient
some_x = [2.0, 3.0, 1.0, 0.1]
NLSolversBase.value_gradient!(od, some_x)
gradient(od)
# Optimize and check the gradient again
res = optimize(od, x, BFGS())
gradient(od)


#4

Thank you very much. This is extremely helpful. Code for my objective/gradient functions are lengthy and complicated, hence posting fully functional code would obfuscate matters.

The input to my objective and gradient functions consists of a collection of differently sized matrices (many thousands of scalar variables in total). From your response to the second part of my question, I figure that for now the only way to handle this is to vectorize and stitch them together when calling optim, then do the reverse in my objective function. I figure the oncedifferentiable functionality does not provide a way out of that. Correct?

Thanks again for the help and examples.


#5

If Optim.jl allows AbstractArrays and internally broadcasts (does it @pkofod?) then a RecursiveArrayTools.ArrayPartition was designed for this. If not, yeah separating out the array using views is the way to go. I show the difference between the two with DifferentialEquations.jl in a blog post here:

but the same idea applies with any package using an array input. In v0.7 views won’t matter as much though, but it will still be a bit of a pain to do this manually IMO so hopefully enough of the Julia ecosystem can be setup with AbstractArray+broadcast that we can start using fancier tools.


#6

Mostly, but I can identify a small issue in NLSolversBase, and all the type annotations in LineSearches have to be loosened (thought they were abstractarray tbh). So it should be a very small change to support ArrayPartitions. I’ll try to do it soon, as we’ve had the request before.


#7

Not quite there, but it will be possible “soon”. With few local changes and https://github.com/JuliaNLSolvers/LineSearches.jl/pull/78 and https://github.com/JuliaNLSolvers/NLSolversBase.jl/pull/37 I get

julia> using Optim, RecursiveArrayTools

julia> f(X) = (10 - X[1])^2 + (0 - X[2])^2 + (0 - X[3])^2 + (5 - X[4])^2
f (generic function with 1 method)

julia> function g!(storage, x)
           storage[1] = -20 + 2 * x[1]
           storage[2] = 2 * x[2]
           storage[3] = 2 * x[3]
           storage[4] = -10 + 2 * x[4]
           return
       end
g! (generic function with 1 method)

julia> ap = ArrayPartition([1.0, 0.0], [1.0, 0.0])
([1.0, 0.0], [1.0, 0.0])

julia> res = optimize(f, g!, ap, BFGS())
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [1.0,0.0,1.0,0.0]
 * Minimizer: [10.0,0.0,0.0,5.0]
 * Minimum: 0.000000e+00
 * Iterations: 1
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 9.00e+00 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = Inf |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 0.00e+00 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 3
 * Gradient Calls: 3

#8

thank you @pkofod. optim’s functionality is very impressive. your efforts are really appreciated.


#9

A look into the dungeon. I’ll tag this change as soon as I iron out some corner cases.

julia> using Optim, RecursiveArrayTools

julia> function polynomial(x)
           return (10.0 - x[1])^2 + (7.0 - x[2])^4 + (108.0 - x[3])^4
       end
polynomial (generic function with 1 method)

julia> function polynomial_gradient!(storage, x)
           storage[1] = -2.0 * (10.0 - x[1])
           storage[2] = -4.0 * (7.0 - x[2])^3
           storage[3] = -4.0 * (108.0 - x[3])^3
       end
polynomial_gradient! (generic function with 1 method)

julia> function polynomial_hessian!(storage, x)
           storage[1, 1] = 2.0
           storage[1, 2] = 0.0
           storage[1, 3] = 0.0
           storage[2, 1] = 0.0
           storage[2, 2] = 12.0 * (7.0 - x[2])^2
           storage[2, 3] = 0.0
           storage[3, 1] = 0.0
           storage[3, 2] = 0.0
           storage[3, 3] = 12.0 * (108.0 - x[3])^2
       end
polynomial_hessian! (generic function with 1 method)

julia> ap = ArrayPartition(rand(1), rand(2))
([0.926164], [0.821162, 0.374145])

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, ConjugateGradient())
Results of Optimization Algorithm
 * Algorithm: Conjugate Gradient
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [9.999999997479323,7.000857916457364, ...]
 * Minimum: 1.932692e-12
 * Iterations: 30
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 7.67e-04 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 6.94e+00 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 5.12e-09 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 62
 * Gradient Calls: 34

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, GradientDescent())
Results of Optimization Algorithm
 * Algorithm: Gradient Descent
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [10.00000456963858,7.010984469414458, ...]
 * Minimum: 2.920265e-08
 * Iterations: 1000
 * Convergence: false
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 7.66e-06 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 2.01e-03 |f(x)|
   * |g(x)| ≤ 1.0e-08: false 
     |g(x)| = 9.14e-06 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: true
 * Objective Calls: 2510
 * Gradient Calls: 2510

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, BFGS())
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [9.999999997694495,6.999913965528232, ...]
 * Minimum: 1.290248e-16
 * Iterations: 39
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 1.55e-05 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 3.07e+00 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 4.61e-09 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 134
 * Gradient Calls: 134

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, LBFGS())
WARNING: Method definition twoloop!(AbstractArray{T, 1} where T, AbstractArray{T, 1} where T, AbstractArray{T, 1} where T, AbstractArray{T, 2} where T, AbstractArray{T, 2} where T, Integer, Integer, AbstractArray{T, 1} where T, AbstractArray{T, 1} where T, Bool, Any, Any) in module Optim at /home/pkm/.julia/v0.6/Optim/src/multivariate/solvers/first_order/l_bfgs.jl:22 overwritten at /home/pkm/.julia/v0.6/Optim/src/multivariate/solvers/first_order/l_bfgs.jl:22.
Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [9.99999999997918,7.000054968318689, ...]
 * Minimum: 1.508989e-17
 * Iterations: 32
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 1.75e-06 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 9.89e+01 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 4.16e-11 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 112
 * Gradient Calls: 112

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, Newton())
Results of Optimization Algorithm
 * Algorithm: Newton's Method
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [9.999999999999996,6.9999829381032095, ...]
 * Minimum: 7.801052e-15
 * Iterations: 2
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 1.81e+01 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 4.22e+16 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 1.05e-10 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 7
 * Gradient Calls: 7
 * Hessian Calls: 2

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, NewtonTrustRegion())
Results of Optimization Algorithm
 * Algorithm: Newton's Method (Trust Region)
 * Starting Point: [0.9261639118389935,0.8211620082823579, ...]
 * Minimizer: [10.0,6.999891492857457, ...]
 * Minimum: 3.277275e-12
 * Iterations: 32
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 6.73e-04 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 4.06e+00 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 9.74e-09 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 33
 * Gradient Calls: 33
 * Hessian Calls: 32

#10

Wow, if you got that working… test with some GPUArrays now? :smile:


#11

OpenCL is not nice to me… maybe another day!

anyway: https://github.com/JuliaNLSolvers/Optim.jl/pull/520