# optim arguments: higher dimensional arrays and gradient specification

Two questions regarding optim,jl:

1. Gradient specification for higher dimensional arrays. Optim.jl documentation states that “for zeroth and first order methods, it is also possible to pass in matrices, or even higher dimensional arrays.” When specifying such input, should the gradient have the same shape or be specified as a vector corresponding to a flattened version of the input?

2. Other input types. Is it possible to go beyond higher dimensional arrays, and use input variables that are arrays of arrays, e.g. Array{Array{Float64,2},3} ?

We should maybe be more explicit, but you can do either. Show me what you want to do and I’ll help (but provide complete functions and initial allocations of x and G so I can run your code). Edit: but by default, the gradient type will be assumed the same type, shape and size as `x`.

Not yet, no.

Edit: maybe you can find inspiration in the tests here https://github.com/JuliaNLSolvers/Optim.jl/blob/7b4f8cb8021fe1ed0a2befc3789a2c60ee3c85e6/test/multivariate/array.jl

Alright, look below. Keep in mind that this is sort of advanced usage of Optim so you need to learn about the syntax to construct a OnceDifferentiable (see https://github.com/JuliaNLSolvers/NLSolversBase.jl/blob/9a5650177851b034d13a909a684d091fe6527b03/README.md

Below, we use vectors for `x`s and matrices for gradients. You could have done it the other way around, or even some other abstractarray (given a sufficient set of functions have methods for your type).

``````# Define objective
f(X) = (10 - X[1])^2 + (0 - X[2])^2 + (0 - X[3])^2 + (5 - X[4])^2
function g!(storage, x)
storage[1] = -20 + 2 * x[1]
storage[2] = 2 * x[2]
storage[3] = 2 * x[3]
storage[4] = -10 + 2 * x[4]
return
end
# A typical x (also the starting value used below)
x = [1.0, 0.0, 1.0, 0.0]
# An instance of the output type of f(x)
fx = 0.0
# A gradient type to hold the gradient, note this is different from x
G = zeros(2,2)
# Create a OnceDifferentiable instnace
od = OnceDifferentiable(f, g!, x, fx, G)
# Evaluate at some point and see what's in the gradient
some_x = [2.0, 3.0, 1.0, 0.1]
# Optimize and check the gradient again
res = optimize(od, x, BFGS())

``````

Thank you very much. This is extremely helpful. Code for my objective/gradient functions are lengthy and complicated, hence posting fully functional code would obfuscate matters.

The input to my objective and gradient functions consists of a collection of differently sized matrices (many thousands of scalar variables in total). From your response to the second part of my question, I figure that for now the only way to handle this is to vectorize and stitch them together when calling optim, then do the reverse in my objective function. I figure the oncedifferentiable functionality does not provide a way out of that. Correct?

Thanks again for the help and examples.

If Optim.jl allows AbstractArrays and internally broadcasts (does it @pkofod?) then a RecursiveArrayTools.ArrayPartition was designed for this. If not, yeah separating out the array using views is the way to go. I show the difference between the two with DifferentialEquations.jl in a blog post here:

but the same idea applies with any package using an array input. In v0.7 views won’t matter as much though, but it will still be a bit of a pain to do this manually IMO so hopefully enough of the Julia ecosystem can be setup with AbstractArray+broadcast that we can start using fancier tools.

Mostly, but I can identify a small issue in NLSolversBase, and all the type annotations in LineSearches have to be loosened (thought they were abstractarray tbh). So it should be a very small change to support ArrayPartitions. I’ll try to do it soon, as we’ve had the request before.

1 Like

Not quite there, but it will be possible “soon”. With few local changes and Move to AbstractArray. by pkofod · Pull Request #78 · JuliaNLSolvers/LineSearches.jl · GitHub and Broadcast for x_of_nans by pkofod · Pull Request #37 · JuliaNLSolvers/NLSolversBase.jl · GitHub I get

``````julia> using Optim, RecursiveArrayTools

julia> f(X) = (10 - X[1])^2 + (0 - X[2])^2 + (0 - X[3])^2 + (5 - X[4])^2
f (generic function with 1 method)

julia> function g!(storage, x)
storage[1] = -20 + 2 * x[1]
storage[2] = 2 * x[2]
storage[3] = 2 * x[3]
storage[4] = -10 + 2 * x[4]
return
end
g! (generic function with 1 method)

julia> ap = ArrayPartition([1.0, 0.0], [1.0, 0.0])
([1.0, 0.0], [1.0, 0.0])

julia> res = optimize(f, g!, ap, BFGS())
Results of Optimization Algorithm
* Algorithm: BFGS
* Starting Point: [1.0,0.0,1.0,0.0]
* Minimizer: [10.0,0.0,0.0,5.0]
* Minimum: 0.000000e+00
* Iterations: 1
* Convergence: true
* |x - x'| ≤ 1.0e-32: false
|x - x'| = 9.00e+00
* |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
|f(x) - f(x')| = Inf |f(x)|
* |g(x)| ≤ 1.0e-08: true
|g(x)| = 0.00e+00
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 3
``````
1 Like

thank you @pkofod. optim’s functionality is very impressive. your efforts are really appreciated.

A look into the dungeon. I’ll tag this change as soon as I iron out some corner cases.

``````julia> using Optim, RecursiveArrayTools

julia> function polynomial(x)
return (10.0 - x[1])^2 + (7.0 - x[2])^4 + (108.0 - x[3])^4
end
polynomial (generic function with 1 method)

storage[1] = -2.0 * (10.0 - x[1])
storage[2] = -4.0 * (7.0 - x[2])^3
storage[3] = -4.0 * (108.0 - x[3])^3
end
polynomial_gradient! (generic function with 1 method)

julia> function polynomial_hessian!(storage, x)
storage[1, 1] = 2.0
storage[1, 2] = 0.0
storage[1, 3] = 0.0
storage[2, 1] = 0.0
storage[2, 2] = 12.0 * (7.0 - x[2])^2
storage[2, 3] = 0.0
storage[3, 1] = 0.0
storage[3, 2] = 0.0
storage[3, 3] = 12.0 * (108.0 - x[3])^2
end
polynomial_hessian! (generic function with 1 method)

julia> ap = ArrayPartition(rand(1), rand(2))
([0.926164], [0.821162, 0.374145])

Results of Optimization Algorithm
* Starting Point: [0.9261639118389935,0.8211620082823579, ...]
* Minimizer: [9.999999997479323,7.000857916457364, ...]
* Minimum: 1.932692e-12
* Iterations: 30
* Convergence: true
* |x - x'| ≤ 1.0e-32: false
|x - x'| = 7.67e-04
* |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
|f(x) - f(x')| = 6.94e+00 |f(x)|
* |g(x)| ≤ 1.0e-08: true
|g(x)| = 5.12e-09
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 62

Results of Optimization Algorithm
* Starting Point: [0.9261639118389935,0.8211620082823579, ...]
* Minimizer: [10.00000456963858,7.010984469414458, ...]
* Minimum: 2.920265e-08
* Iterations: 1000
* Convergence: false
* |x - x'| ≤ 1.0e-32: false
|x - x'| = 7.66e-06
* |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
|f(x) - f(x')| = 2.01e-03 |f(x)|
* |g(x)| ≤ 1.0e-08: false
|g(x)| = 9.14e-06
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: true
* Objective Calls: 2510

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, BFGS())
Results of Optimization Algorithm
* Algorithm: BFGS
* Starting Point: [0.9261639118389935,0.8211620082823579, ...]
* Minimizer: [9.999999997694495,6.999913965528232, ...]
* Minimum: 1.290248e-16
* Iterations: 39
* Convergence: true
* |x - x'| ≤ 1.0e-32: false
|x - x'| = 1.55e-05
* |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
|f(x) - f(x')| = 3.07e+00 |f(x)|
* |g(x)| ≤ 1.0e-08: true
|g(x)| = 4.61e-09
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 134

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, LBFGS())
WARNING: Method definition twoloop!(AbstractArray{T, 1} where T, AbstractArray{T, 1} where T, AbstractArray{T, 1} where T, AbstractArray{T, 2} where T, AbstractArray{T, 2} where T, Integer, Integer, AbstractArray{T, 1} where T, AbstractArray{T, 1} where T, Bool, Any, Any) in module Optim at /home/pkm/.julia/v0.6/Optim/src/multivariate/solvers/first_order/l_bfgs.jl:22 overwritten at /home/pkm/.julia/v0.6/Optim/src/multivariate/solvers/first_order/l_bfgs.jl:22.
Results of Optimization Algorithm
* Algorithm: L-BFGS
* Starting Point: [0.9261639118389935,0.8211620082823579, ...]
* Minimizer: [9.99999999997918,7.000054968318689, ...]
* Minimum: 1.508989e-17
* Iterations: 32
* Convergence: true
* |x - x'| ≤ 1.0e-32: false
|x - x'| = 1.75e-06
* |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
|f(x) - f(x')| = 9.89e+01 |f(x)|
* |g(x)| ≤ 1.0e-08: true
|g(x)| = 4.16e-11
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 112

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, Newton())
Results of Optimization Algorithm
* Algorithm: Newton's Method
* Starting Point: [0.9261639118389935,0.8211620082823579, ...]
* Minimizer: [9.999999999999996,6.9999829381032095, ...]
* Minimum: 7.801052e-15
* Iterations: 2
* Convergence: true
* |x - x'| ≤ 1.0e-32: false
|x - x'| = 1.81e+01
* |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
|f(x) - f(x')| = 4.22e+16 |f(x)|
* |g(x)| ≤ 1.0e-08: true
|g(x)| = 1.05e-10
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 7
* Hessian Calls: 2

julia> optimize(polynomial, polynomial_gradient!, polynomial_hessian!, ap, NewtonTrustRegion())
Results of Optimization Algorithm
* Algorithm: Newton's Method (Trust Region)
* Starting Point: [0.9261639118389935,0.8211620082823579, ...]
* Minimizer: [10.0,6.999891492857457, ...]
* Minimum: 3.277275e-12
* Iterations: 32
* Convergence: true
* |x - x'| ≤ 1.0e-32: false
|x - x'| = 6.73e-04
* |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
|f(x) - f(x')| = 4.06e+00 |f(x)|
* |g(x)| ≤ 1.0e-08: true
|g(x)| = 9.74e-09
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 33