Implementing the log-sum-exp function

Hi everyone!

I am working on stochastic model predictive control in Julia/JuMP. I am using essentially a log-sum-exp function in the objective and encountered some issues with overflow. Now I am trying to implement a reformulation of the log-sum-exp function [1]:

I know JuMP is able to handle the max operation on x. But how is the sum without index k = argmax(x) implemented?

Any general pointers are greatly appreciated as well :slight_smile:

[1] Pierre Blanchard, Desmond J Higham, and Nicholas J Higham. Accurately computing the log-sum-exp and softmax functions. IMA Journal of Numerical Analysis, 41(4):2311–2330, 08 2020

3 Likes

It’s already implemented here: GitHub - JuliaStats/LogExpFunctions.jl: Julia package for various special functions based on `log` and `exp`.

I assume you know this, but your expression is analytically equivalent to \log\left(\sum_i e^{x_i}\right), i.e., just factoring out the largest term. You could also write it as y = x_\mathrm{max} + \log\left(\sum_i e^{x_i-x_\mathrm{max}}\right) and not worry about any index shenanigans.

8 Likes

You might be interested in Convex.jl, whose disciplined convex programming reformulation can handle logsumexp provided you use an adequate solver.

Thank you for the answer! I tried that package but it since I work in the JuMP framework, it does not accept JuMP variable types as arguments. Therefore I am going about this manually…

The analytical equivalent gives overflow issues and my solver (Ipopt) also struggles when disregarding the max index stuff in the sum.

I was actually eyeing Convex.jl for its explicit handling of the logsumexp.

However I work with an SMPC with nonlinear model constraints, which is a nonconvex problem. I still use a local interior point solver (Ipopt) as we assume our initial state x_0 at time of resolving is close enough to the optimum. I guess that is just common practice for real time optimization.
Would Convex.jl be applicable in this case?

I don’t understand why the equivalent form x_{\text{max}} + \log(\sum_i \exp(x_i - x_{\text{max}})) suggested by Tim is not suitable.

For example, the following fails:

using JuMP
using Ipopt

function logsumexp(x)
    return log(sum(exp.(x)))
end

function main()
    n = 10
    model = Model(Ipopt.Optimizer)
    @variable(model, x[1:n])
    @constraint(model, x .≥ 0)
    @constraint(model, x[1] ≤ x[2] - 1000)
    @objective(model, Min, logsumexp(x))

    optimize!(model)
    xval = value.(x)
    @info xval
end

But it works when I change logsumexp as follows:

function logsumexp(x)
    xmax = maximum(x)
    return xmax + log(sum(exp.(x .- xmax)))
end

I don’t think so. Convex.jl must certify that the problem is convex, which it may even fail to do for some convex problems.

Maybe because logsumexp is not one of the supported nonlinear operators? You may want to look into user-defined operators.

1 Like

What Convex.jl does is reformulate logsumexp as a conic programming problem, where you add a constraint that a particular expression belongs to an exponential cone. This is in some ways quite different to a rewriting of the formula, because with this approach, the solver does not execute the function you wrote, but rather receives an algebraic formulation of the problem (not e.g. a function pointer). To do a conic reformulation of logsumexp in JuMP, see: How to implment `logsumexp` function in JuMP? - #2 by odow

Your solver needs to be able to handle exponential cones for this approach to work (both for Convex.jl and for JuMP.jl). I’m not sure if many nonlinear solvers can handle conic constraints, and I don’t think Ipopt can. So this approach unfortunately probably is not suitable.

4 Likes

I don’t know Jump, but I would have presumed that you would want to avoid intermediate allocations for functions used repeatedly in an optimization scenario.

So, for example

function logsumexp(x)
    xmax = maximum(x) 
    exp_ = v -> exp(v - xmax) 
    return xmax + log(sum(exp_, x)) 
end

or

function logsumexp(x)
    xmax = maximum(x) 
    return xmax + log(sum(exp(v - xmax) for v in x)) 
end

AFAICT this does not matter to JuMP, because it does not actually call the function during optimization. It constructs a kind of symbolic representation internally, which is also why only a subset of operators are supported.

1 Like

Your example works, thank you for clarifying! :slight_smile: I think I’m getting an evaluation error from somewhere else since the solver still spits out warnings in my code