I’m following this paper, which uses this method (Butler-Reeds-Dawson 1981 algorithm) to greatly speed up a large Tikhonov regularization problem.
min ||Kf-g||^2 +||αf||^2
s.t. f \geq 0
I wanted to reproduce this method in Julia, so that I can move away from the non-adaptable matlab GUI we’ve been using in our group so far.
Performance is good to have, since this code is used routinely on large sets of data.
There’s another thread where people kindly provided suggestions for solving this problem. Some of these methods provide effectively the same results as the ones by the Optimization.jl method suggested above in this thread. The fastest I’ve tested so far would be this JuMP implementation:
JuMP/Gurobi method
model = Model(() -> Gurobi.Optimizer(GRB_ENV))
set_silent(model)
@variable(model, f[1:size(K, 2)] >= 0)
@variable(model, r[1:size(K, 1)])
@constraint(model, r .== g .- K * f)
@objective(model, Min, LinearAlgebra.dot(r, r) + α * LinearAlgebra.dot(f, f))
optimize!(model)
@assert is_solved_and_feasible(model)
return value.(f), value.(r)
followed by the optimization method suggested above, modified as follows:
Optimization.jl method
function minfun(c::AbstractVector, p=(α, data, K))
G = copy(p[3])
G[:, (p[3]'*c.<=0)] .= 0
f = 0.5 * c' * ((p[1] * I + (G * p[3]')) * c) - c' * p[2]
end
function grad(gradient, c, p=(α, data, K))
G = copy(p[3])
G[:, (p[3]'*c.<=0)] .= 0
gradient .= (p[1] * I + G * p[3]') * c - p[2]
end
function hess(H, c, p=(α, data, K))
G = copy(p[3])
G[:, (p[3]'*c.<=0)] .= 0
H .= p[1] * I + G * p[3]'
end
optf = OptimizationFunction(minfun, grad=grad, hess=hess)
prob = OptimizationProblem(optf, ones(size(K, 1)), (α, g, K))
c = solve(prob, NewtonTrustRegion(), x_tol=1e-8, maxiters=5000, maxtime=100)
f = vec(max.(0, (K' * c)))
(although I reckon allocations on the latter could be greatly reduced by not using deepcopy for G at each call, potentially making it the fastest).
Well, I’d also like to optimize α at the end of the day.
Currently I’m following the method suggested here, where they use a GCV method for that. They effectively try to minimize a “score” parameter produced by the function below
GCV score function
function gcv_score(α, s, r) # where r is the residuals of the solution
ñ = length(s)
c = r ./ α[end]
σ² = s .^ 2
m̂ = sum(σ² ./ (σ² .+ α[end]))
dm̂ = sum(σ² ./ ((σ² .+ α[end]) .^ 2))
t̂ = sum((svds.s .* c) .^ 2 ./ (σ² .+ α[end]))
φₙ = (α[end] .^ 2 * c' * c .* ñ) ./ ((ñ - m̂) .^ 2) # GCV score to be minimized
αₙ = (α[end] .^ 2 * c' * c * dm̂) / (t̂ * (ñ - m̂)) # α update value, test this one next
return φₙ , αₙ
end
function gcv_score(α, s, c) # where c is the vector from the Optimization.jl method
ñ = length(s)
σ² = s .^ 2
m̂ = sum(σ² ./ (σ² .+ α[end]))
dm̂ = sum(σ² ./ ((σ² .+ α[end]) .^ 2))
t̂ = sum((svds.s .* c) .^ 2 ./ (σ² .+ α[end]))
φₙ = (α[end] .^ 2 * c' * c .* ñ) ./ ((ñ - m̂) .^ 2) # GCV score to be minimized
αₙ = (α[end] .^ 2 * c' * c * dm̂) / (t̂ * (ñ - m̂)) # α update value, test this one next
return φₙ , αₙ
end
This fuction returns the objective value which needs to be minimized, as well as the next α to try. The plan is to just evaluate this starting with a large α, and evaluate the score repeatedly until a minimum value is found (if the new α is larger than the previous one, you stop).
Frankly, I’ve got no idea how this formula is derived and what the sensitivity on the optimal α would be in this case, but it’s pretty fast and converges within 4-7 iterations in most cases.
I’m aware you’ve previously recommended the following.
and I did try to play around with it, using OptimizationBBO, but I haven’t found a method yet which converges as fast as the “α update” method above. Still trying to work on that though, and any suggestions would be welcome.
The OneDrive link above has been updated to include the “s” vector required by the gcv function, in case someone wants to play around with it. You will also find inside some .pdf files of the papers cited above (if someone does not have access).