Someone asked for my help with this, however it is outside my experience.

I am told the outer maximization is a quadratic in αi and βi and

the inner minimization is solvable by e.g gradient descent

```
maximize(
mean(
minimize(
each Lᵢ by varying αᵢ, βᵢ (i in 1:N)
)
)
by varying λₜ, ηₜ (t in 1:T)
)
N people are observed at T times.
data and variables associated with a person 1:N use subscript **i**.
data and variables associated with a time 1:T use subscript **t** .
For *each person* 1:N there are
two variables, each with N subscripts, to be fit or solved
αᵢ βᵢ
For *each time* 1:T there are
two sets of six known values (constant Float64s, available in array[s])
c1ₜ c2ₜ c3ₜ c4ₜ c5ₜ c6ₜ
d1ₜ d2ₜ d3ₜ d4ₜ d5ₜ d6ₜ
two variables, each with T subscripts, to be fit or solved
λₜ ηₜ
There are six functions, f1..f6 of the two 1:T-subscripted variables
f1(λₜ, ηₜ) = c1ₜ * λₜ + d1ₜ * ηₜ
f2(λₜ, ηₜ) = c2ₜ * λₜ + d2ₜ * ηₜ
f3(λₜ, ηₜ) = c3ₜ * λₜ + d3ₜ * ηₜ
f4(λₜ, ηₜ) = c4ₜ * λₜ + d4ₜ * ηₜ
f5(λₜ, ηₜ) = c5ₜ * λₜ + d5ₜ * ηₜ
f6(λₜ, ηₜ) = c6ₜ * λₜ + d6ₜ * ηₜ
There are six wrapper functions g1..g6 wrapping f1..f6
and taking as new arguments the two 1:N-subscripted variables
g1(λₜ, ηₜ, αᵢ, βᵢ) = f1(λₜ, ηₜ) * αᵢ^2 == (c1ₜ * λₜ + d1ₜ * ηₜ) * αᵢ^2
g2(λₜ, ηₜ, αᵢ, βᵢ) = f2(λₜ, ηₜ) * βᵢ^2 == (c2ₜ * λₜ + d2ₜ * ηₜ) * βᵢ^2
g3(λₜ, ηₜ, αᵢ, βᵢ) = f3(λₜ, ηₜ) * (αᵢ * βᵢ) == (c3ₜ * λₜ + d3ₜ * ηₜ) * (αᵢ * βᵢ)
g4(λₜ, ηₜ, αᵢ, βᵢ) = f4(λₜ, ηₜ) * αᵢ == (c4ₜ * λₜ + d4ₜ * ηₜ) * αᵢ
g5(λₜ, ηₜ, αᵢ, βᵢ) = f5(λₜ, ηₜ) * βᵢ == (c5ₜ * λₜ + d5ₜ * ηₜ) * βᵢ
g6(λₜ, ηₜ, αᵢ, βᵢ) = f6(λₜ, ηₜ) == (c6ₜ * λₜ + d6ₜ * ηₜ)
The g functions are combined into N sums (the args are elided for brevity)
Lᵢ = L(λₜ, ηₜ, αᵢ, βᵢ) =
g1(λₜ, ηₜ, αᵢ, βᵢ) + g2(λₜ, ηₜ, αᵢ, βᵢ) + g3(λₜ, ηₜ, αᵢ, βᵢ) +
g4(λₜ, ηₜ, αᵢ, βᵢ) + g5(λₜ, ηₜ, αᵢ, βᵢ) + g6(λₜ, ηₜ, αᵢ, βᵢ)
each of the Lᵢ are minimized, so determining the αᵢ, βᵢ
The mean of the Lᵢ is determined
this mean is maximized, so determining the λₜ, ηₜ
maximize(
mean(
minimize(
each Lᵢ by varying αᵢ, βᵢ (i in 1:N)
)
)
by varying λₜ, ηₜ (t in 1:T)
)
```