Translating JuMP variables to optim


#1

I tried to delete this post since it is similar to my other recent post Creating a matrix of variables in optim but I could not. I am having trouble translating the variables R and Q. In JuMP I have currently:

@variable(m, Rdiag[i=1:N])  # first create the variables that will fill the array
R = flipdim(Matrix(Diagonal(Rdiag)),2)  # create the array with those variables along its diagonal
@variable(m, Qdiag[i=1:N]) 
Q = Diagonal(Qdiag)

And I will be optimizing a further expression which takes input variables Q and R, so basically I want optim to vary the diagonal or flipped diagonal components of Q and R respectively, thus varying Q and R, thus optimizing my function e_c:

T=kron(Q,Diagonal(ones(2)))+kron(Diagonal(ones(2)),conj(Q))+kron(R,conj(R))
Exp1.==e.^(T)

One=trace((Exp1)*(kron(C(Q,R),C(conj(Q),conj(R)))))
Two=c*trace((Exp1)*(kron(C(R,R),C(conj(R),conj(R)))))

e_c=append!(One,Two)
res = optimize(e_c,
               method = GradientDescent())

Creating a matrix of variables in optim
#2

When I try your code using constant Rdiag and Qdiag it doesn’t work:

julia> Rdiag = rand(2)
2-element Array{Float64,1}:
 0.667789
 0.0867968

julia> Qdiag = rand(2)
2-element Array{Float64,1}:
 0.876396
 0.749799

julia> R = flipdim(Matrix(Diagonal(Rdiag)),2)
2×2 Array{Float64,2}:
 0.0        0.667789
 0.0867968  0.0

julia> Q = Diagonal(Qdiag)
2×2 Diagonal{Float64}:
 0.876396   ⋅
  ⋅        0.749799

julia> T=kron(Q,Diagonal(ones(2)))+kron(Diagonal(ones(2)),conj(Q))+kron(R,conj(R))
4×4 Array{Float64,2}:
 1.75279     0.0        0.0        0.445942
 0.0         1.6262     0.0579619  0.0
 0.0         0.0579619  1.6262     0.0
 0.00753368  0.0        0.0        1.4996

julia> Exp1.==e.^(T)
ERROR: UndefVarError: Exp1 not defined

julia> Exp1 = e.^(T) # I assume you mean this
4×4 Array{Float64,2}:
 5.7707   1.0      1.0      1.56196
 1.0      5.08449  1.05967  1.0
 1.0      1.05967  5.08449  1.0
 1.00756  1.0      1.0      4.47989

julia> One=trace((Exp1)*(kron(C(Q,R),C(conj(Q),conj(R)))))
ERROR: UndefVarError: C not defined

As you can see C is not defined. Also c is not defined. And append! doesn’t work on 2 numbers, so your code is clearly wrong. If you fix these errors for constant values of the variables, it is then possible to build the optimization on top. One step at a time :slight_smile:


#3

c I will be varying from 0 to 200 sorry I haven’t gotten to figuring out how to code that yet I figured it would be something in the final pages. And C is defined as:

C(A,B)=*(A,B)-*(B,A)

Also Okay thank you I will remove the append:

using Optim
N=2
L=100000
C(A,B)=*(A,B)-*(B,A)
R(R1,R2) = flipdim(Matrix(Diagonal(R1:R2)),2) 
Q(Q1,Q2) = Diagonal(R1:R2) 
T=kron(Q,Diagonal(ones(2)))+kron(Diagonal(ones(2)),conj(Q))+kron(R,conj(R))
Exp1=e.^(*(T,L))

e_c=trace((Exp1)*(kron(C(Q,R),C(conj(Q),conj(R)))))+c*trace((Exp1)*(kron(C(R,R),C(conj(R),conj(R)))))
res = optimize(e_c,
               method = GradientDescent())

#4

Q is a function in the code below, so kron doesn’t make sense, you need to call the function with some inputs Q1 and Q2 to return a Diagonal matrix.

julia> N=2
2

julia> L=100000
100000

julia> C(A,B) = A*B - B*A
C (generic function with 1 method)

julia> R(R1,R2) = flipdim(Matrix(Diagonal(R1:R2)),2)
R (generic function with 1 method)

julia> Q(Q1,Q2) = Diagonal(Q1:Q2) # I assume you mean Q1 and Q2
Q (generic function with 1 method)

julia> T=kron(Q,Diagonal(ones(2)))+kron(Diagonal(ones(2)),conj(Q))+kron(R,conj(R))
ERROR: MethodError: no method matching kron(::#Q, ::Diagonal{Float64})
Closest candidates are:
  kron(::Any, ::Any, ::Any, ::Any...) at operators.jl:424
  kron(::AbstractArray{T,2}, ::AbstractArray{S,2}) where {T, S} at linalg\dense.jl:313
  kron(::Number, ::Union{Number, Union{AbstractArray{T,1}, AbstractArray{T,2}} where T}) at linalg\dense.jl:325
  ...

#5

If you post a complete working piece of code it will be easier for us to help.


#6

Ah yeah that’s one of my questions. I defined Q and R as functions in optim whereas I had them as variable matrices in JuMP, I just didn’t know how to define them as matrices where I will be able to vary their components throughout the optimization.


#7

This is the most complete code I have unfortunately. I cannot complete it more because my questions are about how to formulate e^T so that it works, and how to formulate Q and R so that my e_c function works. I’ll try to keep working at it:

L=100000.
C(A,B)=*(A,B)-*(B,A)
R1=rand(1)
R2=rand(1)
Q1=rand(1)
Q2=rand(1)
R = flipdim(Matrix(Diagonal(R1:R2)),2) 
Q = Diagonal(Q1:Q2) 
T=kron(Q,Diagonal(ones(2)))+kron(Diagonal(ones(2)),conj(Q))+kron(R,conj(R))
Exp1=e.^(*(T,L))

e_c=trace((Exp1)*(kron(C(Q,R),C(conj(Q),conj(R)))))+c*trace((Exp1)*(kron(C(R,R),C(conj(R),conj(R)))))

With this error I realize is simple but I do not quite understand:
MethodError: Cannotconvertan object of type Int64 to an object of type Array{Float64,1} This may have arisen from a call to the constructor Array{Float64,1}(...), since type constructors fall back to convert methods.


#8

Hmm I am not sure I am following. Let me try to go back to the previous piece of code and define C.

Rdiag = rand(2)
Qdiag = rand(2)
L=100000

R = flipdim(Matrix(Diagonal(Rdiag)),2)
Q = Diagonal(Qdiag)
T=kron(Q,Diagonal(ones(2)))+kron(Diagonal(ones(2)),conj(Q))+kron(R,conj(R))

Exp1=e.^(T*L)
#= Notice that `L` is too large so taking the exponent of that creates many `Inf`s
4×4 Array{Float64,2}:
 Inf      1.0    1.0  Inf
   1.0  Inf    Inf      1.0
   1.0  Inf    Inf      1.0
 Inf      1.0    1.0  Inf
=#

Exp1=e.^(T) # I used this definition instead from your first piece of code

C(A,B) = A*B - B*A
One=trace((Exp1)*(kron(C(Q,R),C(conj(Q),conj(R)))))

e_c=trace((Exp1)*(kron(C(Q,R),C(conj(Q),conj(R)))))+c*trace((Exp1)*(kron(C(R,R),C(conj(R),conj(R)))))
# ERROR: UndefVarError: c not defined
# I suppose you meant `L`

e_c=trace((Exp1)*(kron(C(Q,R),C(conj(Q),conj(R)))))+L*trace((Exp1)*(kron(C(R,R),C(conj(R),conj(R)))))

So is this about right?


#9

Yes that is correct, and using e.^(T) is good for now. I meant c actually, I will be optimizing Q and R over a wide range of c that I will be varying from 0 to 200. I suppose we can fix the value for now at some arbitrary value like 100 however eventually I want to plot the optimization of e_c where e_c is minimized using optimal values of Q and R, for many different values of c from 0:200. e_c is basically e © - I cannot figure how to make this not a copywrite symbol but e is a function of c. I am working on understanding this project as I complete it so I am sorry for my ignorance. If I had more time to complete it I would be taking this more slowly and methodically.


#10

Assuming the above code finds e_c from Rdiag, Qdiag and L which are the decision variables, turning the above into an optimization is simple. First you need a function with a single input to optimize. This input will all your decision variables combined, i.e. [Rdiag; Qdiag; L]. You can then take portions of this inside the function to define as your Rdiag, Qdiag and L. The following is the function to optimize:

julia> function ec(x)
           # Let's find the length of Rdiag
           N = (length(x) - 1) ÷ 2

           # Let's extract Rdiag
           Rdiag = x[1:N]

           # Let's extract Qdiag
           Qdiag = x[N+1:2N]

           # Let's extract L
           L = x[2N+1]

           # The rest of the code is the same as constant input case
           R = flipdim(Matrix(Diagonal(Rdiag)), 2)
           Q = Diagonal(Qdiag)
           T = kron(Q, Diagonal(ones(2))) + kron(Diagonal(ones(2)), conj(Q)) + kron(R, conj(R))
           Exp1 = e.^(T)
           C(A,B) = A*B - B*A
           One = trace((Exp1) * (kron(C(Q, R), C(conj(Q), conj(R)))))
           e_c = trace(Exp1 * kron(C(Q, R), C(conj(Q), conj(R)))) + L * trace(Exp1 * kron(C(R, R), C(conj(R), conj(R))))
           return e_c # This is the output
       end
ec (generic function with 1 method)

julia> ec(rand(5))
0.005496995063934321

This function is then used with Optim following the examples on the docs.

Example:

julia> using Optim

julia> optimize(ec, rand(5))
Results of Optimization Algorithm
 * Algorithm: Nelder-Mead
 * Starting Point: [0.7568649987307565,0.32965319650921954, ...]
 * Minimizer: [0.6271401117690574,0.6273412601710198, ...]
 * Minimum: 3.163538e-10
 * Iterations: 42
 * Convergence: true
   *  √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 81

#11

Wow thank you that is very very helpful. I do have one non-urgent question still when you have time. So passing rand(5) into the algorithm generates 5 random numbers to fill the variables (R1,R2,Q1,Q2 as defined earlier) and L?

I still have to work more with the problem but I will do that on my own and see what I can figure out. Thank you so much I understand this completely and besides helping me with my problem you have helped me to understand more of julia and how optim works simultaneously.


#12

No worries. rand(5) will initialize the input to the function x = [R1, R2, Q1, Q2, L] as if they were a combined vector. That’s why I am extracting R, Q and L inside the function.


#13

Okay, that makes sense!


#14

Another specific question. I need L to range continuously from 0 to 200, and I need R1,R2,Q1,Q2 to be complex numbers. I cannot figure out how to code them as complex and after doing that, coding L separately.


#15

Optim supports complex inputs, so as long as all the other methods you use within ec support complex numbers, you can just provide the initial input as a complex vector. We need to add an example to the docs.

A simple approach to handle the constraints on L is to:

  1. Penalize imag(L)^2
  2. Penalize real(L) going outside the bounds. One approach would be to use a barrier method

So you can define parameters penimL and penreL, and then change ec to return

return e_c + penimL*imag(L)^2 - penreL*(log(real(L)) + log(200-real(L))

The process is then

  1. Pick some start values for penimL and penreL
  2. Run res = optimize(...)
  3. Extract L from Optim.minimizer(res)
    • Decrease the value of penreL
    • If imag(L) is far from zero, increase penimL
  4. Run optimize again using the previous answer as your initial condition.
  5. Loop until the minimizer does not change much as you decrease penreL (and possibly increase penimL)

I think you’ll be better off using a gradient based optimizer with finite differences rather than NelderMead with this approach, and possibly the BackTracking linesearch. So, for example, do

ls = LineSearches.BackTracking(order=3)
res = optimize(ec, x0, GradientDescent(linesearch=ls))
# or
res = optimize(ec, x0, LBFGS(linesearch=ls))

#16

You can define the barrier yourself as anriseth is saying or if you are too lazy to, you can unpack each complex decision into 2 real decision variables, then use the box constrained minimization with real decisions. You will need the same packing-unpacking trick of R, Q and L.


#17

Thank you guys that makes sense, I will tackle that. I actually learned that I need my matrices to be full DxD matrices rather than diagonal ones where each element is varied in the optimization. I thought this would be a simple change, and it probably is, but unfortunately I ran into a problem with my new code where I redefine R and Q and I am hoping someone can clear it up:

function Ec(x)
           # set dimension
           D = 2

           # extract Rdiag
           R1 = x[1:D]
            R2=x[1+D:2D]

           # extract Qdiag
           Q1 = x[2D+1:3D]
            Q2=x[3D+1:4D]

           # extract c
           c = x[4D+1]
        # Define matrices R and Q as diagonal and flipped diagonal matrices

           R = [R1;R2]
          Q = [Q1;Q2]
        # Define T as stated in paper
           T = kron(Q, Diagonal(ones(D))) + kron(Diagonal(ones(D)), conj(Q)) + kron(R, conj(R))
        # Define exponential which I will eventual square until convergence
           Exp1 = e.^(T)
        # Define commutator function used in e_c
           C(A,B) = A*B - B*A
        # State Energy function E(c)
           e_c = trace(Exp1 * kron(C(Q, R), C(conj(Q), conj(R)))) + c * trace(Exp1 * kron(C(R, R), C(conj(R), conj(R))))
           return e_c # This is the output
       end
using Optim
D=2
optimize(Ec,rand(4D+1)) #optimize(function,initial vector, algorithm, etc...)

DimensionMismatch("dimensions must match")


#18

You are adding matrices of incompatible sizes in the line T = kron(.... I suggest you go back to the constant input script and try the code line by line before going back to the function, then to the optimization. Debugging all at once is annoying even for advanced users and you seem to be taking your first steps in Julia. If you face other specific obstacles and the documentation didn’t help you much, you can ask here again and many people will be happy to help. Good luck!


#19

Thank you that is good advice. after trying what you’ve suggested, one step at a time works perfectly, its only when I define the function that it breaks down in T. I have discovered that my function creates Q and R as 1X4 arrays, where I wanted them to be 2X2 arrays when D=2. This could be creating the problem in T but I am unsure how to fix it because I have already used the semi-colon in my definitions of Q and R.


#20

x[1:n] always returns a column vector. If you want a row vector, you need to transpose it, that is x[1:n]'.

[x1;x2] is short for vcat(x1, x2) https://docs.julialang.org/en/stable/manual/arrays/#Concatenation-1. vcating 2 column vectors gives a longer column vector. vcating 2 row vectors gives a matrix.