The MWE below (run with Julia 0.6.2 and Optim.jl 0.14.1) errors at an non-reproducible k with the error ERROR: Initial value and slope must be finite. Am I doing something wrong?

using Optim
srand(83092)
n = 200
AtA = rand(n,n)
Atb = rand(n)
ff(x) = norm(AtA*x - Atb)^2
function gg!(storage,x)
storage = AtA.'*(AtA*x-Atb)
end
x_init = fill(0.1,n)
lower = zeros(n)
upper = fill(Inf,n)
algo = Fminbox{GradientDescent}()
for k = 1:1000
print("$k ")
results = optimize(ff, gg!, x_init, lower, upper, algo)
end

It should be storage .= AtA'*(AtA*x-Atb). Also consider using in place versions of the multiplication and transpose.

I think your choice of objective is also not ideal. If I understand correctly, you are trying to minimize ||Ax-b||_2 subject to constraints. When not subject to a constraint, then taking the gradient of that 2-norm and equating it to 0 is equivalent to solving the system of equations A^TAx - A^Tb = 0, which you can solve by minimizing ||A^TAx - A^Tb||_2.

Instead, I think you are after minimizing x^TA^TAx -2x^TA^Tb subject to x \geq 0. What you are doing now is minimizing x^TA^TAA^TAx -2x^TA^TAA^Tb. I believe the answer should may be the same, but you are doing unnecessary computations and squaring the condition number of the Hessian of the objective.

I want to solve Cx=b for x\ge0 given an ill-conditioned matrix C. To do this, I minimize ||Cx-b||^2 +\alpha^2||Lx||^2, where the second term is a Tikhonov regularization functional. This gives (C^TC+\alpha^2L^TL)x = C^Tb. The AtA in my code corresponds to the regularized matrix in parentheses, and the Atb corresponds to the right-hand side. I apologize if my notation was misleading.

If you want to minimize ||Cx-b||^2+\alpha||Lx||^2 then just use that as the objective function with the box constraints. No need to get the 0-gradient equations and then minimize the norm of the residual; I am not sure these 2 versions are even the same to be honest with the constraints coming to play.

Also I think Fminbox{ConjugateGradient} will be a better choice for your problem compared to Fminbox{GradientDescent}.

That worked - thanks for your help! ConjugateGradient is indeed significantly faster than GradientDescent (but still much slower than the NNLS solver I posted in this topic).

Does slower mean in CPU time or in number of iterations/function evaluations etc.?

Speed comparisons are also made more difficult due to different tolerance specifications (in Fminbox, you’ll both have to think of the tolerances of the inner optimizer and of the outer optimizer).