Memory allocation problem with Optim + ApproxFun

dbennett1994 · July 5, 2021, 2:43pm

Hi all. I’m quite new to Julia and this is my first post.

I want to optimize a functional F[\phi] = \int_0^1 \left[\frac{1}{2}\left(\partial_x \phi\right)^2 + V(x,\phi(x))\right]\text{d}x. Of course, this can be done by using variational methods and solving ODEs, but I want to treat it as an optimization problem. I know \phi is periodic in the internal [0,1], so I write it as an expansion \phi = \sum_n \phi_n e^{i 2\pi n x} (or equivalently in terms of sines and cosines) using the ApproxFun package. The functional now becomes a function of the series coefficients: F[\phi] \to F(\phi_1,\ldots ,\phi_n), which can be minimized using optimization methods.

I have a code which is working, but slowly. It is not so bad in the example provided, but will be much slower when the number of coefficients needed becomes very large.

I suspect the problem is to do with memory allocations. Defining a series expansion using ApproxFun is fine and so is differentiating the function, but when I do other operations like squaring the function, it does a lot of memory allocations. This propagates quite poorly through the numerical integration and optimization functions.

I would really appreciate if somebody could help me find a way to get this method to work without the huge memory allocation (which will hopefully speed things up and make it scale better when going to 2D). Minimal working example below using V(x,\phi(x)) = k \cos{(2\pi(x+\phi(x)))} as an example:

using ApproxFun   # For defining series expansions
using QuadGK      # for numerical integration
using Optim       # for optimization

using Plots ; gr()
using LaTeXStrings

s=Fourier(0..1) # Specify the space.
                # CosSpace / SinSpace for just cos/sin
                # or exponentials with Laurent(0..1)

# Returns an expansion using entries of vec as coefficients
function fourier(space,vec::Vector{Float64})
    return Fun(space, vcat([0.0],vec)) # fix const term to 0.
end;

function V(space,vec::Vector{Float64},k::Float64) # Just the potential
    return x->k*cos.(2π*(x+fourier(space,vec).(x)))
end

# The integrand
function integrand(space, vec::Vector{Float64}, k::Float64)
    return x->0.5*(fourier(space,vec)'.(x))^2 + k*cos(2π*(x+fourier(space,vec)(x)))
end

# Do the integral numerically for a given vector of coefficients.
function integral(space,vec::Vector{Float64}, k::Float64)
        return quadgk(integrand(space,vec,k), 0, 1, atol=1e-12)
end

Now I can use optim to determine \phi(x) like this:

@time res=optimize(v->integral(s,v,10.0)[1], zeros(20));
sol = fourier(s,res.minimizer)
# setting the initial condition to zeros(20) also sets the length of v

37.462515 seconds (702.81 M allocations: 29.900 GiB, 10.96% gc time, 0.05% compilation time)

fig

This gives the correct answer, which I have also obtained by solving the ODE. But the number of memory allocations is alarming, and I suspect this would become problematic in situations where I need a large number of coefficients.

I checked the timing / memory allocations of some functions:

v1 = [0.0,1.0]

@time fourier(s,v1)  # sin(2πx)
@time fourier(s,v1)' # 2π*cos(2πx)
@time (fourier(s,v1)')^2 # 4π²cos²(2πx)

0.000005 seconds (3 allocations: 256 bytes)
0.000008 seconds (6 allocations: 528 bytes)
0.000238 seconds (378 allocations: 19.734 KiB)

So the problem comes when I do operations on the Fun() objects from ApproxFun, e.g. multiplying or taking cos(Fun()). Obviously this gets much worse when I go to do the numerical integration:

@time quadgk((fourier(s,v1)')^2, 0, 1, atol=1e-12)
@time quadgk(V(s,vector,1.0), 0, 1, atol=1e-12) # k = 1
@time integral(s,vector,1.0)

0.000676 seconds (9.06 k allocations: 392.062 KiB)
0.004499 seconds (85.10 k allocations: 3.697 MiB)
0.011349 seconds (190.62 k allocations: 8.405 MiB)

And worse again when I pass to the optimizer. So any help would be appreciated (as well as any other general advice on improving this code).

PS: I know that in this example, \phi(x) is odd, which cuts the number of coefficients in half and speeds up the program. But in the actual problem I want to solve it is not odd, so I don’t do that here.

stevengj · July 5, 2021, 5:10pm

I wouldn’t use QuadGK, but rather just use the sum method provided by ApproxFun itself — ApproxFun is optimized for computing its own integrals (it “knows” a lot more about the functions than QuadGK).

Moreover, you are constructing the fourier(space,vec) function for every point x that quadgk requests, which is horribly inefficient. You should just construct the Fun objects once.

In particular, I would suggest something like:

V(x, ϕ, k) = k * cos(2π * (x + ϕ))

function objective(space, vec, k)
    ϕ = Fun(space, [0; vec])
    dϕ = ϕ'
    return sum(Fun(x -> 0.5 * dϕ(x)^2 + V(x, ϕ(x), k), space))
end

stevengj · July 5, 2021, 5:59pm

Note, however that plugging into the default optimization algorithm is a fairly terrible way to solve such problems.

If V is a linear function of |\phi|^2, you have a Hermitian eigenproblem Hx=λx, but instead of applying equations that “know” how to solve eigenproblems, e.g. Lanczos or LOBPCG, you are applying generic minimization to x'Hx with a normalization constraint, i.e. you are effectively minimizing the Rayleigh quotient. Worse, you are not even using a gradient-based optimization method — optimize defaults to Nelder–Mead, which is a derivative-free algorithm and is truly awful in high dimensions. Even if you switched to a gradient-based method (by computing the Fréchet-derivative of your functional analytically), it still won’t be as good as a Krylov-based algorithm specialized for eigenproblems (e.g. LOBPCG is essentially a specialized Rayleigh–quotient minimization).

Even if V is an arbitrary nonlinear function as in your example, you will still want to use a gradient-based optimization algorithm to make this practical in high dimensions. (Fortunately, the Fréchet derivative (i.e. the functional derivative) is pretty easy to write down analytically in your case.)

dbennett1994 · July 5, 2021, 11:13pm

Thank you for the helpful replies! As I said in my post, I am fairly new to Julia, and I haven’t used optimization techniques before, so apologies for my terrible methods.

Rewriting the integral

Your first suggestion of defining the function in a more efficient way, and using sum instead of QuadGK already improves things quite a lot:

V(x, ϕ, k) = k * cos(2π * (x + ϕ))

function objective(space, vec, k)
    ϕ = Fun(space, [0.0; vec])
    dϕ = ϕ'
    return sum(Fun(x -> 0.5 * dϕ(x)^2 + V(x, ϕ(x), k), space))
end

@time optimize(v->objective(s, v, 1.0),zeros(20))

14.220752 seconds (206.55 M allocations: 8.488 GiB, 11.37% gc time, 0.05% compilation time)

Status: success
Candidate solution
Final objective value: -2.174133e-01
Found with
Algorithm: Nelder-Mead
Convergence measures
√(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08
Work counters
Seconds run: 14 (vs limit Inf)
Iterations: 996
f(x) calls: 1529

An speedup in time / reduction in memory allocation by a factor of ~ 3. It still seems like a lot of allocations, but this could be due to a poor method as you mentioned.

Just checking the indivual parts of the integral:

function objective1(space, vec, k) # just try integrating ½(∂ϕ)²
    ϕ = Fun(space, [0.0; vec])
    dϕ = ϕ'
    return sum(Fun(x -> 0.5 * dϕ(x)^2, space))
end

function objective2(space, vec, k) # just integrate the potential
    ϕ = Fun(space, [0.0; vec])
    return sum(Fun(x -> V(x, ϕ(x), k), space))
end

@time objective(s,[0.0,1.0],1.0)
@time objective1(s,[0.0,1.0],1.0)
@time objective2(s,[0.0,1.0],1.0)

0.001235 seconds (14.70 k allocations: 624.594 KiB)
0.000227 seconds (1.82 k allocations: 78.297 KiB)
0.000964 seconds (8.87 k allocations: 380.750 KiB)

It seems the potential is resulting in a lot of memory allocations.

Changing to a gradient-based method

I switched from the default method to LBFGS:

@time optimize(v->objective(s, v, 1.0),zeros(20),LBFGS())

8.262182 seconds (119.96 M allocations: 4.936 GiB, 11.28% gc time, 1.21% compilation time)

Status: success
Candidate solution
Final objective value: -2.174133e-01
Found with
Algorithm: L-BFGS
Convergence measures
|x - x’| = 9.39e-10 ≰ 0.0e+00
|x - x’|/|x’| = 7.33e-09 ≰ 0.0e+00
|f(x) - f(x’)| = 5.55e-17 ≰ 0.0e+00
|f(x) - f(x’)|/|f(x’)| = 2.55e-16 ≰ 0.0e+00
|g(x)| = 5.11e-10 ≤ 1.0e-08
Work counters
Seconds run: 8 (vs limit Inf)
Iterations: 13
f(x) calls: 34
∇f(x) calls: 34

It makes far fewer iterations and f(x) calls, but still seems like a lot of memory allocations.

For including the derivative analytically, the gradient of the function with respect to \{\phi_i\} is \nabla_{\{\phi_i\}} F = \int_0^1 \left[-\partial_x^2\phi(x) - 2\pi k \sin{2\pi (x + \phi(x))}\right]\nabla_{\{\phi_i\}} \phi(x) \text{d} x . The integral can be evaluated as before. \nabla_{\{\phi_n\} } \phi(x) would be a vector with entries

Fun(space, [0, .... , 1, ...., 0])
#ith position

I tried, following the Optim documentation for including analytic derivatives:

using LinearAlgebra

dV(x, ϕ, k) = -2π*k*sin(2π*(x + ϕ))

function gradientF!(G::Vector{Float64},vec::Vector{Float64})
    ϕ = Fun(s, [0; vec])
    d2ϕ = ϕ''
    mat = Matrix(1.0*I,length(vec),length(vec)) # identiy matrix for partial_i phi
#    G = zeros(length(vec))
    for i in 1:length(vec)
       G[i] = sum(Fun(x -> (-d2ϕ(x) + dV(x, ϕ(x), 1.0))*Fun(s,[[0.0];mat[i,:]]).(x), s))
    end
end

But when I run e.g.

@time optimize(v->objective(s, v, 1.0), gradientF!, zeros(20), LBFGS())

it just hangs.

Thanks again for your help

maxkapur · July 5, 2021, 11:38pm

Could Surrogates.jl be useful here?

stevengj · July 6, 2021, 12:12am

The gradient should just be the ApproxFun coefficients (Fourier components) of \left[-\partial_x^2\phi(x) - 2\pi k \sin{2\pi (x + \phi(x))}\right], I think — no loop of integrals required.

(Note that it is always a good idea to test the correctness of your gradient by comparing it to finite differences before running any optimization.)

dbennett1994 · July 6, 2021, 12:10pm

I was able to improve the objective function further by taking \text{d}\phi outside of the Fun() call:

V(x::Float64, ϕ::Float64, k::Float64) = k * cos(2π * (x + ϕ))

function objective(space::T where T <: ApproxFun.SumSpace, vec::Vector{Float64}, k::Float64)
    ϕ = Fun(space, [0.0;vec])
    dϕ = ϕ'
    return sum(0.5*dϕ^2 + Fun(x -> V(x, ϕ(x), k), space))
end

@time optimize(v->objective(s, v, 1.0),zeros(20), LBFGS())

4.835670 seconds (64.62 M allocations: 2.754 GiB, 9.39% gc time, 1.62% compilation time)

This is a factor of 10 faster than my original attempt. I think the expensive part is V(x,\phi(x)) being an anonymous function, but I don’t know if this can be avoided.

Analytic Gradient

Yes, I think you are right. I tried to implement this as follows:

dV(x::Float64, ϕ::Float64, k::Float64) = -2π*k*sin(2π*(x + ϕ))

function g!(G::Vector{Float64},vec::Vector{Float64})
    s=Fourier(0..1)
    k = 1.0
    ϕ = Fun(s, [0; vec])
    d2ϕ = ϕ''
    G = resize!(coefficients( -1.0*d2ϕ + Fun(x -> dV(x, ϕ(x), k),s)), length(vec))
    # need G and vec to have same length
end

I followed the Optim documentation and tried to define the gradient using only G to store the components of \nabla_{\{\phi_i\}}F and vec which contains \{\phi_i\}. I also used resize! to make sure that G and vec have the same length because Fun() will automatically select the number of coefficients it contains.

@time res=optimize(vec->objective(s, vec, 1.0), g!, zeros(20), LBFGS())

0.014414 seconds (15.99 k allocations: 761.576 KiB, 48.27% compilation time)

Status: success
Candidate solution
Final objective value: -5.622630e-17
Found with
Algorithm: L-BFGS
Convergence measures
|x - x’| = 0.00e+00 ≤ 0.0e+00
|x - x’|/|x’| = NaN ≰ 0.0e+00
|f(x) - f(x’)| = NaN ≰ 0.0e+00
|f(x) - f(x’)|/|f(x’)| = NaN ≰ 0.0e+00
|g(x)| = NaN ≰ 1.0e-08
Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 0
f(x) calls: 1
∇f(x) calls: 1

It stops after a single step…

stevengj · July 6, 2021, 12:43pm

Did you check your gradient against a finite-difference approximation (e.g. using FiniteDifferences.jl)?

dbennett1994 · July 6, 2021, 2:06pm

After checking with FiniteDifferences, it seems my gradient was off by a factor of 1/2. Also, I was including the zeroth coefficient, which was included when I called coefficient(...). I fixed both of those:

function g!(G::Vector{Float64},vec::Vector{Float64})
    s=Fourier(0..1)
    k = 1.0
    ϕ = Fun(s, [0; vec])
    d2ϕ = ϕ''
    G = 0.5*resize!(deleteat!(coefficients( -1.0*d2ϕ + Fun(x -> dV(x, ϕ(x), k),s)),1), length(vec)) # remove first entry with deleteat!
end

And the answer matches well with what I obtain by inserting objective into finite differences. However, when I try to use this in optimize() it still stops on the first iteration

Topic		Replies	Views
Speeding up combination of Funs Numerics question , approxfun	15	333	June 11, 2025
Optim.optimize() uses excessive memory with autodiff? Optimization (Mathematical) optim	11	1686	June 22, 2018
Evaluate integral on many points (Cubature.jl ?) Numerics	44	4672	July 26, 2017
Using ApproxFun for dynamic programming Numerics question	11	3719	March 11, 2017
Nonlinear Least Squares Millions of Allocations Optimization (Mathematical) jump , ipopt	9	500	June 15, 2024

Memory allocation problem with Optim + ApproxFun

Rewriting the integral

Changing to a gradient-based method

Analytic Gradient

Related topics