Including Gradient Term in Optimization.jl

fdekerme · March 14, 2025, 7:56pm

Hi everyone ,

I’m trying to solve an optimization problem using Optimization.jl, but I’m struggling with incorporating a gradient term \nabla in the process. My cost function to minimize is:

\text{Cost} \left(T^{1}, T^{2}\right) = \|r^{A}\|^2 + \|r^{B}\|^2 + \|\nabla(T^{1})\|^2 + \|\nabla(T^{2})\|^2

where r^{A} and r^{B} involve log-sum-exponential terms:

r^{i}(T^{1}, T^{2}) = \log \left(\sum_K K \exp \left(-T^{1} v_K^{1} - T^{2} v_K^{2} \right) \right) - \log \left(Y^{i}\right)

Y^{i}, T^{1} and T^{2} are matrix of size n x m

I first tried without the gradient term \nabla to solve the problem “pixel by pixel” (basically with 2 nested for loop), and it works. I then tried to vectorize my residual function r over the whole matrix, but then the optimization is extremely slow and doesn’t work anymore.

function f(K, v₁, v₂, T₁, T₂) # the theoretical value, compared to $Y^{i}$ the measurments
    x = sum(K[i] * exp.(-v₁[k] * T₁ - v₂[k] * T₂) for k in axes(K)[1])
    return -log.(x / length(axes(K)[1]))
end

function r(T, p)
    K, v₁, v₂, matrix_A, matrix_B, s = p
    T₁, T₂ = view(T, :, 1:s), view(T, :, (s+1):size(T, 2)) # I concatenated T₁, T₂  to be able to use Optimization.jl
    sum((f(K.A, v₁, v₂, T₁, T₂) .- matrix_A).^2 .+
    (f(K.B, v₁, v₂, T₁, T₂) .- matrix_B).^2)
end

I think (?) vectorizing like that this is the only way to integrate the gradient term \nabla in the optimization, since the “pixel by pixel” approach does not give access to neighboring values in the matrix.
I don’t know if all this is clear ( I tried haha ), but does anyone have experience with this kind of problem or any relevant example?

Thanks in advance!
fdekerm

gdalle · March 14, 2025, 9:28pm

Hi! Can you provide a complete example of your code (with all imports, function and variable definitions), and explain what you mean by “gradient” here?

fdekerme · March 18, 2025, 9:34pm

I can’t share the inputs for the functions, unfortunately.
For example in this article, they propose the following optimization problem:

with

and the “gradient”

They use images of N = 1024 * 768 = 786432 pixels, so the matrix A (size 2N *2N) is huge.
Likewise, the operation is extremely heavy to realize on the whole matrix.

I’m not sure how to implement/vectorize the code to solve this type of problem with Optimiation.jl.
Thanks in advance for your help,
fdekerm

stevengj · March 18, 2025, 11:28pm

It’s a convex quadratic problem. Just take the derivative, set it equal to zero, and you have a system of linear equations that you can solve in a variety of ways, even for huge problems. In particular, it looks like the matrices here are all very sparse, so you can use sparse (or iterative) methods.

fdekerme · April 3, 2025, 9:39pm

Thank you very much for your answer! That is indeed what they are proposing in the article. I tried to implement it with SparseMatrix.
The code (with an test example) is as follows:

using LinearAlgebra
using SparseArrays

function ∇R(nrows, ncols)
    N = nrows *ncols
    spdiagm(0 => 1.0*ones(N),
    -1 => -0.25*ones(N-1),
    1 => -0.25*ones(N-1),
    nrows => -0.25*ones(N-nrows),
    -nrows => -0.25*ones(N-nrows)
    )

end

function linear_solve_reg(images, μ₁, μ₂, λ)
    L = vcat(vec(images.top), vec(images.bottom))

    nrows, ncols = size(images.top)
    N = nrows * ncols
    R1 = ∇R(nrows, ncols)
    R = blockdiag(R1, R1)

    M = [μ₁.low μ₂.low; μ₁.high μ₂.high]
    A = kron(sparse(M), I(N)) + λ * R

    δ = A\L

    mat1 = reshape(δ[1:N], (nrows, ncols))
    mat2 = reshape(δ[N+1:end], (nrows, ncols))

    return mat1, mat2
end

# Test the function with fake data
nrows, ncols = 100, 100
images = (top = rand(nrows, ncols), bottom = rand(nrows, ncols))

μ₁ = (low = 0.3, high = 0.21)
μ₂ = (low = 0.19, high = 0.17)

λ = 0.1

mat1, mat2 = linear_solve_reg(images, μ₁, μ₂, λ)

I tried to do it with LinearSolve.jl, but the built-in Julia operator \ seems to work even for sparse matrix. For some images, however, I get NaN in output, I don’t know why.

In addition to that, I tried to implement the convex quadratic problem in JuMP.jl to train myself. I manage to do it without the regularization, but it does not work anymore when I add it (divergence).

using JuMP, Ipopt, Images
function R(δ, i, j)
    R1 = δ[i, j, 1] - (δ[i-1, j, 1] + δ[i+1, j, 1] + δ[i, j-1, 1] + δ[i, j+1, 1])
    R2 = δ[i, j, 2] - (δ[i-1, j, 2] + δ[i+1, j, 2] + δ[i, j-1, 2] + δ[i, j+1, 2])
    return R1 + R2
end

model = Model(Ipopt.Optimizer)
A  = [0.3 0.21; 0.19 0.17]
λ = 0.1
pad_top_image = collect(padarray(rand(100,150), Fill(0, (1, 1))))
pad_bottom_image = collect(padarray(rand(100,150), Fill(0, (1, 1))))
L = cat(pad_top_image, pad_bottom_image; dims=3)

x,y = size(pad_top_image)
@variable(model, δ[1:x, 1:y, 1:2])
@variable(model, z[1:x, 1:y, 1:2])

@constraint(model, [i=2:x-1, j=2:y-1], z[i,j,:] == A * δ[i, j, :] - L[i, j, :])
@objective(model, Min, sum(z.^2) + λ * sum(R(δ, i, j) for i=2:x-1, j=2:y-1))

optimize!(model)

δ_opt = value.(δ)
mat1 = δ_opt[:, :, 1]
mat2 = δ_opt[:, :, 2]

Thanks in advance for your help!
fdekerm

Topic		Replies	Views
Using gradient function with Optim New to Julia question	4	1716	September 18, 2017
Multivariable optimization of matrices with "no method matching zero(::Type{Matrix{Float64}})" Optimization (Mathematical) question , optim	4	62	November 4, 2024
User defined gradient in JuMP with Ipopt: gradient with user data Optimization (Mathematical) jump , optimization , ipopt	3	142	June 26, 2024
Nonlinear optimization in JuMP adding the analytic gradient expression Optimization (Mathematical) jump	4	313	April 9, 2023
Strange errors occurred when I optimized with Zygote and Optim General Usage optim , zygote	3	757	October 4, 2020

Including Gradient Term in Optimization.jl

Related topics