# Quasi Maximum Likelihood implementation with Optim.jl

#1

Hi,

I am trying to implement a QML estimation problem with a system of censored equations. I am trying this with Julia’s Optim library. However, I have not been able to build a working model. I get an error message:

MethodError: no method matching zero(::Type{Array{Float64,2}}).

I am new to Julia and I could use some help on where to proceed next. I use Julia 0.6.2 with the most current packages.

Here is a working example with fake data of what I’ve got thus far:

using Distributions, Optim

nrows = 1000

X = rand(0:1, nrows, 3)
X = [ones(nrows) X]

Y = rand(Truncated(Normal(.5), 0, 1), nrows, 2)
nShares = size(Y,2)
nVars = size(X,2)

function censQML(Y, X)

ll = 0

for t in 1:nrows
for i in 1:nShares
for j in 1:nShares

if j != i

e_i(δ,σ) = (Y[t,i] - *(X[t,:], δ[:,i]))/σ[:,i]
e_j(δ,σ) = (Y[t,j] - *(X[t,:], δ[:,j]))/σ[:,j]

ρ(δ,σ) = tanh(cor(e_i(δ,σ), e_j(δ,σ)))

w(δ,σ) = e_i(δ,σ).^2 + e_j(δ,σ).^2 - 2.*ρ(δ,σ).*e_i(δ,σ).*e_j(δ,σ)

if (Y[t,i] == 0) & (Y[t,j] == 0)
l(δ,σ) = logpdf(MvNormal([0, 0], ρ(δ,σ)), [e_i(δ,σ), e_j(δ,σ)])
end

if (Y[t,i] == 1) & (Y[t,j] == 0)
l(δ,σ) = logpdf(MvNormal([1, 0], -ρ(δ,σ)), [-e_i(δ,σ), e_j(δ,σ)])
end

if (Y[t,i] == 0) & (Y[t,j] == 1)
l(δ,σ) = logpdf(MvNormal([0, 1], -ρ(δ,σ)), [e_i(δ,σ), -e_j(δ,σ)])
end

if (0 < Y[t,i] < 1) & (0 < Y[t,j] < 1)
l(δ,σ) = -log(2.*pi.*sqrt(1 - ρ(δ,σ).^2)) - σ[t,i] - σ[t,j] - w(δ,σ)./(2.*(1 - ρ(δ,σ).^2))
end

if (0 < Y[t,i] < 1) & (Y[t,j] == 0)
l(δ,σ) = log(1./exp(σ[t,i])*pdf(Normal(0, σ[t,i]), e_i(δ,σ)))
end

if (Y[t,i] == 0) & (0 < Y[t,j] < 1)
l(δ,σ) = log(1./exp(σ[t,j])*pdf(Normal(0, σ[t,j]), e_j(δ,σ)))
end

#ll = ll + l(δ,σ)
end
end
end
end
end

censQMLfn = censQML(Y, X)

δ0 = Array{Float64}(nVars,nShares)
σ0 = Array{Float64}(nVars,nShares)

res = optimize(censQMLfn, [δ0, σ0], BFGS())


Any help would be greatly appreciated. I’ve already benefited from two blog posts that have guided my attempt:

#2

Using should be using.

The function method censQML(Y, X) returns nothing. So censQMLfn has the value nothing.

The error you reported is often encountered when using a type, rather than an instance. There is a method for zero whose argument is an instance of Array{Float64,2}

Maybe you meant to pass the function censQML to optimize ?

#3

Thank you so much John! I understand your point. I had defined summation of looped values, which was commented out from the code I posted:

 ll = ll + l(δ,σ)


I wrongly interpreted that to be more fundamental error so I commented it out in order to make progress in other parts of the code. That returns me to other errors I encountered before.

Including that line again produces a new error of l not being defined. So I initialize that with zero before the loops, but now the problem is that δ is not defined. I don’t understand what is the fundamental difference to the first of the two examples that I used? That has a function with non-defined argument μσ that does not cause UndefVarError. Why does it appear with my code? Does it have something to do with the if statements?

I think my current code passes censQML to optimize with pre-determined argument values. However, I realize now that that is an unnecessary twist any way.

Antti

PS. I fixed using in my original post.

#4

Your code still have some issues. For example in that line

e_i(δ,σ) = (Y[t,i] - *(X[t,:], δ[:,i]))/σ[:,i]

You are using two arrays, δ and σ which are not defined anywhere. Note that non-constant global variables are bad for performance, you should try to pass everything you need to censQML as parameters (or define them as global const).

#5

There are still errors in you example. I suggest starting with something simpler and building up.

I did not get my previous point across: In Julia, a for loop returns the value nothing, I mean an object called nothing. The last statement in the function censQML is a for loop, so the function returns the value nothing. This means that censQMLfn = censQML(Y, X) assigns the value nothing to the name censQMLfn. You are passing nothing as the first parameter to optimize. This is certainly not what you want.

The lines starting with w(δ,σ) = etc. look like function definitions to me. Maybe you want w to be an Array ?

Remember that functions typically do not operate arraywise. If you want to map sqrt over an array, you need something like sqrt.(array).

#6

Thank you both! That helped me make some progress again.

Jonathan: You are right. My use of functions was misguided. I now made δ and σ the arguments of my function, which has no functions in itself (see example below).

John: So it appears after all that I only half understood your point. I think I got the idea now, but I am not sure how to implement it. I made l a three dimensional array, which is initialized with zeros. I need to loop over each of its dimensions, because each likelihood contribution is conditional on values of each cell of Y. What is unclear still is that whether I need to return l after each nested loop? I tried that although it feels like overdoing it. Nevertheless, the error message remains the same.

Inside each loop, I calculate with scalars only so I changed all the operators to elementwise.

Here is my revised example:

using Distributions, Optim

nObs = 1000

const X = [ones(nObs) rand(0:1, nObs, 3)]
nVars = size(X,2)

const Y = rand(Truncated(Normal(.5), 0, 1), nObs, 2)
nShares = size(Y,2)

function censQML(δ,σ)

l = zeros(nShares, nShares, nObs)

for i in 1:nShares

e = (Y[:,i] - *(X, δ[:,i]))./transpose(σ[:,i])

for j in 1:nShares

if j != i

for t in 1:nObs

e_i = e[t,i]
e_j = e[t,j]

ρ = tanh(cor(e_i, e_j))

w = e_i^2 + e_j^2 - 2*ρ*e_i*e_j

if (Y[t,i] == 0) & (Y[t,j] == 0)
l[i,j,t] = logpdf(MvNormal([0, 0], ρ), [e_i, e_j])
end

if (Y[t,i] == 1) & (Y[t,j] == 0)
l[i,j,t] = logpdf(MvNormal([0, 0], -ρ), [-e_i, e_j])
end

if (Y[t,i] == 0) & (Y[t,j] == 1)
l[i,j,t] = logpdf(MvNormal([0, 0], -ρ), [e_i, -e_j])
end

if (0 < Y[t,i] < 1) & (0 < Y[t,j] < 1)
l[i,j,t] = -log(2*pi*sqrt(1 - ρ^2)) - σ[t,i] - σ[t,j] - w/(2*(1 - ρ^2))
end

if (0 < Y[t,i] < 1) & (Y[t,j] == 0)
l[i,j,t] = log(1/exp(σ[t,i])*pdf(Normal(0, σ[t,i]), e_i))
end

if (Y[t,i] == 0) & (0 < Y[t,j] < 1)
l[i,j,t] = log(1/exp(σ[t,j])*pdf(Normal(0, σ[t,j]), e_j))
end
return(l)

end
return(l)
end
return(l)
end
return(l)
end
return(sum(l))
end

δ0 = zeros(nVars, nShares)
σ0 = zeros(nVars, nShares)

res = optimize(censQML, [δ0, σ0], BFGS())

#7

There are two problems remaining (and an additional one if you insist on keeping the setup like it is now).

First of all, you objective has to take one abstract array, you currently only define it for two inputs. If you want it to work like that, you have to use an ArrayPartition (or some similar construct) from RecursiveArrayTools.jl.

using RecursiveArrayTools
ap = ArrayPartition(δ0,σ0) # define a structure Optim can broadcast over
# Define a method that takes in the array partition
censQML(ap) = censQML(ap.x[1], ap.x[2]) # grab the internal arrays
optimize(censQML, ap, BFGS())


This will not work though! Because current we use Calculus.jl to construct gradients, and that doesn’t support ArrayPartitions. We will fix this soon, but for now you need to provide your own gradient in the case. The alternative is to use NelderMead, since it doesn’t require the gradient:

using RecursiveArrayTools
ap = ArrayPartition(δ0,σ0) # define a structure Optim can broadcast over
# Define a method that takes in the array partition
censQML(ap) = censQML(ap.x[1], ap.x[2]) # grab the internal arrays


edit: I was mistaken, we actually use DiffEqDiffTools for gradients. That should work with ArrayPartitions right @ChrisRackauckas ?

#8

Yes, you have too many returns. The function exits with the value l the first time a return statement is encountered. You want just one return statement as the last expression in the function.

#9

It might? But if it doesn’t we can just add a specialization.

#10

I made some further progress thanks to your suggestions. Also I spotted some errors of mine and simplified some parts of the function. I post the current code below. I apply ArrayPartition with Nelder-Mead method as suggested. Now optimize returns a different error: DimensionMismatch("array could not be broadcast to match destination"). Does that mean that the previous error got solved after all?

using Distributions, Optim, RecursiveArrayTools

nObs = 1000

const X = [ones(nObs) rand(0:1, nObs, 3)]
nVars = size(X,2)

const Y = rand(Truncated(Normal(.5), 0, 1), nObs, 2)
nShares = size(Y,2)

function censQML(δ,σ)

l = zeros(nShares, nShares, nObs)

for i in 1:nShares

e = (Y[:,i] - *(X, δ[:,i]))./σ[i]

for j in 1:nShares

ρ = tanh(cor(e[:,i], e[:,j]))
w = e[:,i].^2 + e[:,j].^2 - 2ρ.*e[:,i].*e[:,j]

if j != i

for t in 1:nObs

e_i = e[t,i]
e_j = e[t,j]

if (Y[t,i] == 0) & (Y[t,j] == 0)
l[i,j,t] = logpdf(MvNormal([0, 0], ρ), [e_i, e_j])
end

if (Y[t,i] == 1) & (Y[t,j] == 0)
l[i,j,t] = logpdf(MvNormal([0, 0], -ρ), [-e_i, e_j])
end

if (Y[t,i] == 0) & (Y[t,j] == 1)
l[i,j,t] = logpdf(MvNormal([0, 0], -ρ), [e_i, -e_j])
end

if (0 < Y[t,i] < 1) & (0 < Y[t,j] < 1)
l[i,j,t] = -log(2*pi*sqrt(1 - ρ^2)) - σ[i] - σ[j] - w[t]/(2*(1 - ρ^2))
end

if (0 < Y[t,i] < 1) & (Y[t,j] == 0)
l[i,j,t] = log(1/exp(σ[i])*pdf(Normal(0, σ[i]), e_i))
end

if (Y[t,i] == 0) & (0 < Y[t,j] < 1)
l[i,j,t] = log(1/exp(σ[j])*pdf(Normal(0, σ[j]), e_j))
end

end
end
end
end
return(sum(l))
end

δ0 = zeros(nVars, nShares)
σ0 = zeros(nShares)

ap = ArrayPartition(δ0, σ0)

censQML(ap) = censQML(ap.x[1], ap.x[2])

res = optimize(censQML, ap, NelderMead())

#11

Errors are usually associated with a line number in a file. You should include this information.

#12

Well, without even going for the ArrayPartition style, you objective doesn’t even run

julia> censQML(δ0, σ0)
ERROR: BoundsError: attempt to access 1000-element Array{Float64,1} at index [Base.Slice(Base.OneTo(1000)), 2]
Stacktrace:
[1] throw_boundserror(::Array{Float64,1}, ::Tuple{Base.Slice{Base.OneTo{Int64}},Int64}) at ./abstractarray.jl:433
[2] checkbounds at ./abstractarray.jl:362 [inlined]
[3] macro expansion at ./multidimensional.jl:441 [inlined]
[4] _getindex at ./multidimensional.jl:438 [inlined]
[5] getindex(::Array{Float64,1}, ::Colon, ::Int64) at ./abstractarray.jl:882
[6] censQML(::Array{Float64,2}, ::Array{Float64,1}) at ./REPL[7]:11


#13

Thanks for helping me to find an error again! I had defined the residuals wrongly, which caused that dimension mismatch. Below is the new version that gets past it. I at least have a running objective now.

However, optimize won’t still run. Here’s the error message:

DimensionMismatch("array could not be broadcast to match destination")

Stacktrace:
[4] copy!(::RecursiveArrayTools.ArrayPartition{Float64,Tuple{Array{Float64,2},Array{Float64,1}}}, ::RecursiveArrayTools.ArrayPartition{Float64,Tuple{Array{Float64,2},Array{Float64,1}}}) at C:\.julia\v0.6\RecursiveArrayTools\src\array_partition.jl:140
[5] value!!(::NLSolversBase.NonDifferentiable{Float64,RecursiveArrayTools.ArrayPartition{Float64,Tuple{Array{Float64,2},Array{Float64,1}}},Val{false}}, ::RecursiveArrayTools.ArrayPartition{Float64,Tuple{Array{Float64,2},Array{Float64,1}}}) at C:\.julia\v0.6\NLSolversBase\src\interface.jl:8


I am editing my source in a Jupyter notebook. I did not find a way to copy the line numbers. Tell me if that is possible somehow.

using Distributions, Optim, RecursiveArrayTools

nObs = 1000

const X = [ones(nObs) rand(0:1, nObs, 3)]
nVars = size(X,2)

const Y = rand(Truncated(Normal(.5), 0, 1), nObs, 2)
nShares = size(Y,2)

function censQML(δ,σ)

l = zeros(nObs, nShares, nShares)

for i in 1:nShares

e = Y .- *(X, δ)

for j in 1:nShares

if j != i

ρ = tanh(cor(e[:,i], e[:,j]))
w = e[:,i].^2 + e[:,j].^2 - 2ρ.*e[:,i].*e[:,j]

for t in 1:nObs

e_i = e[t,i]/σ[i]
e_j = e[t,j]/σ[j]

if (Y[t,i] == 0) & (Y[t,j] == 0)
l[t,i,j] = logpdf(MvNormal([0, 0], ρ), [e_i, e_j])
end

if (Y[t,i] == 1) & (Y[t,j] == 0)
l[t,i,j] = logpdf(MvNormal([0, 0], -ρ), [-e_i, e_j])
end

if (Y[t,i] == 0) & (Y[t,j] == 1)
l[t,i,j] = logpdf(MvNormal([0, 0], -ρ), [e_i, -e_j])
end

if (0 < Y[t,i] < 1) & (0 < Y[t,j] < 1)
l[t,i,j] = -log(2*pi*sqrt(1 - ρ^2)) - σ[i] - σ[j] - w[t]./(2*(1 - ρ^2))
end

if (0 < Y[t,i] < 1) & (Y[t,j] == 0)
l[t,i,j] = log(1/exp(σ[i])*pdf(Normal(0, σ[i]), e_i))
end

if (Y[t,i] == 0) & (0 < Y[t,j] < 1)
l[t,i,j] = log(1/exp(σ[j])*pdf(Normal(0, σ[j]), e_j))
end

end
end
end
end
return(sum(l))
end

δ0 = ones(nVars, nShares)
σ0 = ones(nShares)

ap = ArrayPartition(δ0, σ0)

censQML(ap) = censQML(ap.x[1], ap.x[2])
censQML(δ0, σ0)

#res = optimize(censQML, ap, NelderMead())

#14

This runs, I just used plain arrays, it’s a bit ugly but it’s simpler I think. I also added some variables to your function to avoid globals and a minus sign to the error function (if you want to maximize the likelihood, you have to minimize the negative log likelihood).

using Distributions, Optim

nObs = 1000

const X = [ones(nObs) rand(0:1, nObs, 3)]
nVars = size(X,2)

const Y = rand(Truncated(Normal(.5), 0, 1), nObs, 2)
nShares = size(Y,2)

function censQML(p, nObs, nShares)

δ = reshape(p[1:nVars*nShares],nVars, nShares)
σ = p[(nVars*nShares+1):end]

l = zeros(nObs, nShares, nShares)

for i in 1:nShares

e = Y .- *(X, δ)

for j in 1:nShares

if j != i

ρ = tanh(cor(e[:,i], e[:,j]))
w = e[:,i].^2 + e[:,j].^2 - 2ρ.*e[:,i].*e[:,j]

for t in 1:nObs

e_i = e[t,i]/σ[i]
e_j = e[t,j]/σ[j]

if (Y[t,i] == 0) & (Y[t,j] == 0)
l[t,i,j] = logpdf(MvNormal([0, 0], ρ), [e_i, e_j])
end

if (Y[t,i] == 1) & (Y[t,j] == 0)
l[t,i,j] = logpdf(MvNormal([0, 0], -ρ), [-e_i, e_j])
end

if (Y[t,i] == 0) & (Y[t,j] == 1)
l[t,i,j] = logpdf(MvNormal([0, 0], -ρ), [e_i, -e_j])
end

if (0 < Y[t,i] < 1) & (0 < Y[t,j] < 1)
l[t,i,j] = -log(2*pi*sqrt(1 - ρ^2)) - σ[i] - σ[j] - w[t]./(2*(1 - ρ^2))
end

if (0 < Y[t,i] < 1) & (Y[t,j] == 0)
l[t,i,j] = log(1/exp(σ[i])*pdf(Normal(0, σ[i]), e_i))
end

if (Y[t,i] == 0) & (0 < Y[t,j] < 1)
l[t,i,j] = log(1/exp(σ[j])*pdf(Normal(0, σ[j]), e_j))
end

end
end
end
end
return(sum(l))
end

δ0 = ones(nVars, nShares)
σ0 = ones(nShares)

pinit = [δ0[:]; σ0[:]]

errorfun = p -> -censQML(p, nObs, nShares)

@assert typeof(errorfun(pinit)) <: Real
@time errorfun(pinit)

pmin = Optim.minimizer(res)


#15

Thank you so much Jonathan! That indeed solved my problem. I started to run this with my real data (~500 000 observations and eight equations) and it remains to be seen how long that will take. I’m very excited altogether.

It is a bit unintuitive to have parameter dimensions as function arguments but as long as it works I’m happy. Do I interpret this correctly that optimize accepts only vectors as arguments and we can circumvent it this way? And alternatively you can apply ArrayPartition that could also achieve the same thing?

Cheers,

Antti

#16

Optimisation usually goes from R^n into R yes, I’m not familiar with ArrayPartition but if you just have two vectors to split/join it’s not that bad to do it by hand.

It is a bit unintuitive to have parameter dimensions as function arguments

You can get them inside your function from size instead of passing them as parameters. Just avoid non-constant globals (and personally I would add: avoid globals altogether).

it remains to be seen how long that will take

You’d better time your error function with @time and set a limit of function evaluations by passing Optim.Options(f_calls_limit=x) to optimize, otherwise you might end up waiting for days. It seems there’s also a time_limit option.

http://julianlsolvers.github.io/Optim.jl/stable/user/config/#general-options

#17

The time_limit is indeed present. It will check the time once every major iteration, so if one evaluation takes 70 days it won’t terminate until that point even if you set the time limit to 1 second - just so you’re aware.

But yes, I agree. Optimize the objective calls first, it will save you a lot of headaches. Unless of course you feel like it is already performant.

ArrayPartitions are really the way to supply the delta and sigma matrices, but it appears there’s a bug in DiffEqDiffTools for array partitions OR we’re calling the gradient code incorrectly in Optim. If you can provide the gradient yourself, then this should work. We actively try to support this kind of thing, so it is meant to work.

#18

I was finally able to make this work with Nelder-Mead only, but it solves fast enough (~ 1 hour). At this point I realized that I need to include constraints to parameters and that isn’t possible with Optim.jl. E.g. I want that

sum(δ[1,:]) = 1


I’ve thought about few options for constrained version - at least NLopt.jl and JuMP.jl could be possible. Unfortunately, both have some downsides. JuMP is perhaps not feasible because it does not accept function as an argument and NLopt does not look straightforward either because it takes a single vector as an argument for both the objective and constraints.

Do you have any suggestions of how to proceed? Are there any other options than JuMP and NLopt?

function censQML(p, nObs, nShares)

δ = reshape(p[1:nVars*nShares], nVars, nShares)
σ = p[(nVars*nShares + 1):end]

l = zeros(nObs, nShares, nShares)
e = Y .- *(X, δ)./transpose(exp.(σ))

for i in 1:nShares

for j in 1:nShares

if j < i

ρ = tanh(cor(e[:,i], e[:,j]))
w = e[:,i].^2 + e[:,j].^2 - 2ρ.*e[:,i].*e[:,j]

for t in 1:nObs

e_i = e[t,i]
e_j = e[t,j]

if (Y[t,i] == 0) & (Y[t,j] == 0)
l[t,i,j] = log(bvnuppercdf(e_i, e_j, ρ))
end

if (Y[t,i] == 1) & (Y[t,j] == 0)
l[t,i,j] = log(bvnuppercdf(-e_i, e_j, -ρ))
end

if (Y[t,i] == 0) & (Y[t,j] == 1)
l[t,i,j] = log(bvnuppercdf(e_i, -e_j, -ρ))
end

if (0 < Y[t,i] < 1) & (0 < Y[t,j] < 1)
l[t,i,j] = -log(2*pi*sqrt(1 - ρ^2)) - σ[i] - σ[j] - w[t]./(2*(1 - ρ^2))
end

if (0 < Y[t,i] < 1) & (Y[t,j] == 0)
l[t,i,j] = log(1/exp(σ[i])*pdf(Normal(), e_i)*
cdf(Normal(), (e_i - ρ*e_j)/sqrt(1 - ρ^2)))
end

if (Y[t,i] == 0) & (0 < Y[t,j] < 1)
l[t,i,j] = log(1/exp(σ[j])*pdf(Normal(), e_j) *
cdf(Normal(), (e_j - ρ*e_i)/sqrt(1 - ρ^2)))
end

end
end
end
end
return(-sum(l))
end

#19

If

\sum_{i = 1}^n \delta_i = 1

is a constraint, consider an unconstrained \{\delta_i\}_{i=1}^{n-1} and calculate the last element from the constraint.

If, in addition, you want \delta_i \ge 0 (ie a simplex), you need eg a stick-breaking method. Just open an issue at

and I am happy to add it (it is planned anyway).

Also, using derivatives (with automatic differentiation) is very likely to be helpful.

#20

Hi Tamas!

Thanks for your suggestion. I’ve been contemplating about your solution as it is in the heart of the methodology I’m trying to apply. The problem is that with a system of censored equation the solution is not invariant to which equation you omit. I’ve been searching for a method that allows the constraints or would be invariant in case of dropping an equation. I’ve found a solution in literature that uses Gibbs sampling and I’m now trying to get my head around it. So at the moment I’m not in immediate need of that extension but it could be something I will return to later on.

Cheers,

Antti