Does Optim not support function optimization with matrix variables?

I want to optimize my function with Optim, but something went wrong . I have written an example. Does this example show that optim.jl does not support functions whose variables are matrices?If not, what should I do about it?

using Zygote
using Optim
using RecursiveArrayTools
function get_f(A)
  M=reshape(1:16,4,4)
U=exp(A)
T=UMU’
return tr(T)
end
function my_function()
A0=zeros(4,4)
results=optimize(get_f,VectorOfArray([A0]))
end
my_function()

And the error message reads:

MethodError: no method matching +(::Array{Float64,2}, ::Float64)
For element-wise addition, use broadcasting with dot syntax: array .+ scalar
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any…) at operators.jl:538
+(!Matched::Bool, ::T) where T<:AbstractFloat at bool.jl:103
+(!Matched::ChainRulesCore.DoesNotExist, ::Any) at C:\Users\38606.julia\packages\ChainRulesCore\trzfY\src\differential_arithmetic.jl:23


Stacktrace:
[1] simplexer(::Optim.AffineSimplexer, ::VectorOfArray{Float64,3,Array{Array{Float64,2},1}}) at C:\Users\38606.julia\packages\Optim\TNmSw\src\multivariate\solvers\zeroth_order\nelder_mead.jl:13
[2] initial_state(::NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Float64,Nothing}, ::NonDifferentiable{Float64,VectorOfArray{Float64,3,Array{Array{Float64,2},1}}}, ::VectorOfArray{Float64,3,Array{Array{Float64,2},1}}) at C:\Users\38606.julia\packages\Optim\TNmSw\src\multivariate\solvers\zeroth_order\nelder_mead.jl:155
[3] optimize(::NonDifferentiable{Float64,VectorOfArray{Float64,3,Array{Array{Float64,2},1}}}, ::VectorOfArray{Float64,3,Array{Array{Float64,2},1}}, ::NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Float64,Nothing}) at C:\Users\38606.julia\packages\Optim\TNmSw\src\multivariate\optimize\optimize.jl:33
[4] optimize(::Function, ::VectorOfArray{Float64,3,Array{Array{Float64,2},1}}; inplace::Bool, autodiff::Symbol, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\38606.julia\packages\Optim\TNmSw\src\multivariate\optimize\interface.jl:64
[5] optimize at C:\Users\38606.julia\packages\Optim\TNmSw\src\multivariate\optimize\interface.jl:58 [inlined]
[6] my_function() at .\In[7]:4
[7] top-level scope at In[8]:1
[8] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

Why not just use a vector argument, and reshape it in get_f? Which itself could be callable with information about the dimensions. Roughly something like (untested code follows):

struct MyObjective
    d::Int
end

function (obj::MyObjective)(a)
    A = reshape(a, obj.d, obj.d)
    ...
end

optimize(MyObjective(4), zeros(16))

optimize(a -> get_f(reshape(a,4,4)),vec(A0)), maybe?

Thank you for your useful reply! But I still have two questions. Why do we have to reshape the variable A? When my objective function is multiple matrices, vectorization of variables with vec will cause problems, Is there any alternative that can replace vec?

Optimize has an operation that initializes the variable

zero(vec([[1 2;3 4],[1 2 3;4 5 6]]))

MethodError: no method matching zero(::Type{Array{Int64,2}})
Closest candidates are:
zero(!Matched::Type{Dates.DateTime}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Dates\src\types.jl:404
zero(!Matched::Type{Pkg.Resolve.VersionWeight}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\Resolve\versionweights.jl:15
zero(!Matched::Type{Dates.Time}) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Dates\src\types.jl:406

Stacktrace:
[1] zero(::Array{Array{Int64,2},1}) at .\abstractarray.jl:975
[2] top-level scope at In[9]:1
[3] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091

You can split an input vector into two vectors, and reshape those. Eg something like

julia> a = 1:10
1:10

julia> reshape(@view(a[1:6]), 2, 3), reshape(@view(a[7:end]), 2, 2)
([1 3 5; 2 4 6], [7 9; 8 10])

Most (all?) multivariate optimization routines operate \mathbb{R}^n, so you have to map back and forth. This is a normal part of numerical optimization.

1 Like

This is the same as using vec directly, and optimize will not support this type of input. I mean that my function variable is multiple matrices, not one vector.Since the input requirement for optimize requires that the input variables should be stored in an array, I would organize the individual matrices into an array of inputs. The general method results in the following error:

So I found VectorOfArray. It supports the following operations:

zero(VectorOfArray([[1 2;3 4],[1 2 3;4 5 6]]))
julia>>VectorOfArray{Int64,3}:
2-element Array{Array{Int64,2},1}:
[0 0; 0 0]
[0 0 0; 0 0 0]

And then, going back to my original question,

On the contrary, a vector is pretty much the only format most multivariate optimizers support.

Again, you just need a small wrapper function to convert to the input your objective needs.

1 Like

In practice, yes. In principle, all of these could broaden their arguments to accept AbstractArray (or even to duck-type, I guess).

You mean to the mapping themselves? Internally all the algorithms are defined on \mathbb{R}^n anyway, so I expect that some conversion to <: AbstractVector would be needed for all the algorithms to work out (eg think about a dogleg step in trust region methods).

OptimKit.jl supports variables of arbitrary type without the need for reshaping and all that, but is otherwise much more bare bones and limited. You have to explicitly specify the gradient (you could still yourself use AD to compute this) and there is only a small number of optimization methods available. But it might be sufficient for your problem.

2 Likes

Even algorithms that construct linear operators, e.g. BFGS, could I guess be formulated in terms of multi-linear algebra, but I agree that such algorithms would probably want to call vec internally for convenience. There are other algorithms, however, such as nonlinear conjugate-gradient, that only require a vector space with an inner product. (Dog-leg steps fall into the latter camp, no?)

The variables of a smooth optimization problem can be elements of arbitrary smooth manifolds, they don’t even need to be vectors. Gradient based optimization then requires a metric, to map the partial derivatives of a scalar function on that manifold (a vector in the cotangent space) to an element of the tangent space (a direction in which to move on the manifold), together with a retraction to actually take a finite step in such a given direction. I think that’s the most general formulation and that’s the one I was aiming for with OptimKit.jl.

2 Likes

Yes, I think you are right. But I am not sure that simply extending to arrays (with an Euclidean metric I guess) is worth it.

I think that if one wants to generalize, OptimKit.jl has the right level of generality.

That said, for the problem in this topic, I would still consider split & reshape, as it is very easy to do.

I’m pretty sure Optim supports general AbstractArrays. See eg this which optimizes over complex n x m matrices: https://github.com/JuliaNLSolvers/Optim.jl/blob/master/test/multivariate/manifolds.jl. If something does not work please open an issue.

Can you read this discussion? I don’t know if Optim doesn’t support general AbstractArrays very well, or if my function itself defines the problem.
https://discourse.julialang.org/t/no-method-matching-isless-array-float64-2-array-float64-2/48086