Including Gradient Term in Optimization.jl

Hi everyone :grinning:,

I’m trying to solve an optimization problem using Optimization.jl, but I’m struggling with incorporating a gradient term \nabla in the process. My cost function to minimize is:

\text{Cost} \left(T^{1}, T^{2}\right) = \|r^{A}\|^2 + \|r^{B}\|^2 + \|\nabla(T^{1})\|^2 + \|\nabla(T^{2})\|^2

where r^{A} and r^{B} involve log-sum-exponential terms:

r^{i}(T^{1}, T^{2}) = \log \left(\sum_K K \exp \left(-T^{1} v_K^{1} - T^{2} v_K^{2} \right) \right) - \log \left(Y^{i}\right)

Y^{i}, T^{1} and T^{2} are matrix of size n x m

I first tried without the gradient term \nabla to solve the problem “pixel by pixel” (basically with 2 nested for loop), and it works. I then tried to vectorize my residual function r over the whole matrix, but then the optimization is extremely slow and doesn’t work anymore.

function f(K, v₁, v₂, T₁, T₂) # the theoretical value, compared to $Y^{i}$ the measurments
    x = sum(K[i] * exp.(-v₁[k] * T₁ - v₂[k] * T₂) for k in axes(K)[1])
    return -log.(x / length(axes(K)[1]))
end

function r(T, p)
    K, v₁, v₂, matrix_A, matrix_B, s = p
    T₁, T₂ = view(T, :, 1:s), view(T, :, (s+1):size(T, 2)) # I concatenated T₁, T₂  to be able to use Optimization.jl
    sum((f(K.A, v₁, v₂, T₁, T₂) .- matrix_A).^2 .+
    (f(K.B, v₁, v₂, T₁, T₂) .- matrix_B).^2)
end 

I think (?) vectorizing like that this is the only way to integrate the gradient term \nabla in the optimization, since the “pixel by pixel” approach does not give access to neighboring values in the matrix.
I don’t know if all this is clear ( I tried haha :slight_smile: ), but does anyone have experience with this kind of problem or any relevant example?

Thanks in advance!
fdekerm

Hi! Can you provide a complete example of your code (with all imports, function and variable definitions), and explain what you mean by “gradient” here?

1 Like

I can’t share the inputs for the functions, unfortunately.
For example in this article, they propose the following optimization problem:


with

and the “gradient”

They use images of N = 1024 * 768 = 786432 pixels, so the matrix A (size 2N *2N) is huge.
Likewise, the operation Screenshot 2025-03-18 at 5.29.52 PM is extremely heavy to realize on the whole matrix.

I’m not sure how to implement/vectorize the code to solve this type of problem with Optimiation.jl.
Thanks in advance for your help,
fdekerm

It’s a convex quadratic problem. Just take the derivative, set it equal to zero, and you have a system of linear equations that you can solve in a variety of ways, even for huge problems. In particular, it looks like the matrices here are all very sparse, so you can use sparse (or iterative) methods.

3 Likes

Thank you very much for your answer! That is indeed what they are proposing in the article. I tried to implement it with SparseMatrix.
The code (with an test example) is as follows:

using LinearAlgebra
using SparseArrays

function ∇R(nrows, ncols)
    N = nrows *ncols
    spdiagm(0 => 1.0*ones(N),
    -1 => -0.25*ones(N-1),
    1 => -0.25*ones(N-1),
    nrows => -0.25*ones(N-nrows),
    -nrows => -0.25*ones(N-nrows)
    )

end

function linear_solve_reg(images, μ₁, μ₂, λ)
    L = vcat(vec(images.top), vec(images.bottom))

    nrows, ncols = size(images.top)
    N = nrows * ncols
    R1 = ∇R(nrows, ncols)
    R = blockdiag(R1, R1)

    M = [μ₁.low μ₂.low; μ₁.high μ₂.high]
    A = kron(sparse(M), I(N)) + λ * R

    δ = A\L

    mat1 = reshape(δ[1:N], (nrows, ncols))
    mat2 = reshape(δ[N+1:end], (nrows, ncols))

    return mat1, mat2
end

# Test the function with fake data
nrows, ncols = 100, 100
images = (top = rand(nrows, ncols), bottom = rand(nrows, ncols))

μ₁ = (low = 0.3, high = 0.21)
μ₂ = (low = 0.19, high = 0.17)

λ = 0.1

mat1, mat2 = linear_solve_reg(images, μ₁, μ₂, λ)

I tried to do it with LinearSolve.jl, but the built-in Julia operator \ seems to work even for sparse matrix. For some images, however, I get NaN in output, I don’t know why.

In addition to that, I tried to implement the convex quadratic problem in JuMP.jl to train myself. I manage to do it without the regularization, but it does not work anymore when I add it (divergence).

using JuMP, Ipopt, Images
function R(δ, i, j)
    R1 = δ[i, j, 1] - (δ[i-1, j, 1] + δ[i+1, j, 1] + δ[i, j-1, 1] + δ[i, j+1, 1])
    R2 = δ[i, j, 2] - (δ[i-1, j, 2] + δ[i+1, j, 2] + δ[i, j-1, 2] + δ[i, j+1, 2])
    return R1 + R2
end

model = Model(Ipopt.Optimizer)
A  = [0.3 0.21; 0.19 0.17]
λ = 0.1
pad_top_image = collect(padarray(rand(100,150), Fill(0, (1, 1))))
pad_bottom_image = collect(padarray(rand(100,150), Fill(0, (1, 1))))
L = cat(pad_top_image, pad_bottom_image; dims=3)

x,y = size(pad_top_image)
@variable(model, δ[1:x, 1:y, 1:2])
@variable(model, z[1:x, 1:y, 1:2])

@constraint(model, [i=2:x-1, j=2:y-1], z[i,j,:] == A * δ[i, j, :] - L[i, j, :])
@objective(model, Min, sum(z.^2) + λ * sum(R(δ, i, j) for i=2:x-1, j=2:y-1))

optimize!(model)

δ_opt = value.(δ)
mat1 = δ_opt[:, :, 1]
mat2 = δ_opt[:, :, 2]

Thanks in advance for your help!
fdekerm