# Memory allocation problem with Optim + ApproxFun

Hi all. I’m quite new to Julia and this is my first post.

I want to optimize a functional F[\phi] = \int_0^1 \left[\frac{1}{2}\left(\partial_x \phi\right)^2 + V(x,\phi(x))\right]\text{d}x. Of course, this can be done by using variational methods and solving ODEs, but I want to treat it as an optimization problem. I know \phi is periodic in the internal [0,1], so I write it as an expansion \phi = \sum_n \phi_n e^{i 2\pi n x} (or equivalently in terms of sines and cosines) using the ApproxFun package. The functional now becomes a function of the series coefficients: F[\phi] \to F(\phi_1,\ldots ,\phi_n), which can be minimized using optimization methods.

I have a code which is working, but slowly. It is not so bad in the example provided, but will be much slower when the number of coefficients needed becomes very large.

I suspect the problem is to do with memory allocations. Defining a series expansion using ApproxFun is fine and so is differentiating the function, but when I do other operations like squaring the function, it does a lot of memory allocations. This propagates quite poorly through the numerical integration and optimization functions.

I would really appreciate if somebody could help me find a way to get this method to work without the huge memory allocation (which will hopefully speed things up and make it scale better when going to 2D). Minimal working example below using V(x,\phi(x)) = k \cos{(2\pi(x+\phi(x)))} as an example:

using ApproxFun   # For defining series expansions
using QuadGK      # for numerical integration
using Optim       # for optimization

using Plots ; gr()
using LaTeXStrings

s=Fourier(0..1) # Specify the space.
# CosSpace / SinSpace for just cos/sin
# or exponentials with Laurent(0..1)

# Returns an expansion using entries of vec as coefficients
function fourier(space,vec::Vector{Float64})
return Fun(space, vcat([0.0],vec)) # fix const term to 0.
end;

function V(space,vec::Vector{Float64},k::Float64) # Just the potential
return x->k*cos.(2π*(x+fourier(space,vec).(x)))
end

# The integrand
function integrand(space, vec::Vector{Float64}, k::Float64)
return x->0.5*(fourier(space,vec)'.(x))^2 + k*cos(2π*(x+fourier(space,vec)(x)))
end

# Do the integral numerically for a given vector of coefficients.
function integral(space,vec::Vector{Float64}, k::Float64)
return quadgk(integrand(space,vec,k), 0, 1, atol=1e-12)
end


Now I can use optim to determine \phi(x) like this:

@time res=optimize(v->integral(s,v,10.0)[1], zeros(20));
sol = fourier(s,res.minimizer)
# setting the initial condition to zeros(20) also sets the length of v


37.462515 seconds (702.81 M allocations: 29.900 GiB, 10.96% gc time, 0.05% compilation time)

This gives the correct answer, which I have also obtained by solving the ODE. But the number of memory allocations is alarming, and I suspect this would become problematic in situations where I need a large number of coefficients.

I checked the timing / memory allocations of some functions:

v1 = [0.0,1.0]

@time fourier(s,v1)  # sin(2πx)
@time fourier(s,v1)' # 2π*cos(2πx)
@time (fourier(s,v1)')^2 # 4π²cos²(2πx)


0.000005 seconds (3 allocations: 256 bytes)
0.000008 seconds (6 allocations: 528 bytes)
0.000238 seconds (378 allocations: 19.734 KiB)

So the problem comes when I do operations on the Fun() objects from ApproxFun, e.g. multiplying or taking cos(Fun()). Obviously this gets much worse when I go to do the numerical integration:

@time quadgk((fourier(s,v1)')^2, 0, 1, atol=1e-12)
@time quadgk(V(s,vector,1.0), 0, 1, atol=1e-12) # k = 1
@time integral(s,vector,1.0)


0.000676 seconds (9.06 k allocations: 392.062 KiB)
0.004499 seconds (85.10 k allocations: 3.697 MiB)
0.011349 seconds (190.62 k allocations: 8.405 MiB)

And worse again when I pass to the optimizer. So any help would be appreciated (as well as any other general advice on improving this code).

PS: I know that in this example, \phi(x) is odd, which cuts the number of coefficients in half and speeds up the program. But in the actual problem I want to solve it is not odd, so I don’t do that here.

I wouldn’t use QuadGK, but rather just use the sum method provided by ApproxFun itself — ApproxFun is optimized for computing its own integrals (it “knows” a lot more about the functions than QuadGK).

Moreover, you are constructing the fourier(space,vec) function for every point x that quadgk requests, which is horribly inefficient. You should just construct the Fun objects once.

In particular, I would suggest something like:

V(x, ϕ, k) = k * cos(2π * (x + ϕ))

function objective(space, vec, k)
ϕ = Fun(space, [0; vec])
dϕ = ϕ'
return sum(Fun(x -> 0.5 * dϕ(x)^2 + V(x, ϕ(x), k), space))
end

2 Likes

Note, however that plugging into the default optimization algorithm is a fairly terrible way to solve such problems.

If V is a linear function of |\phi|^2, you have a Hermitian eigenproblem Hx=λx, but instead of applying equations that “know” how to solve eigenproblems, e.g. Lanczos or LOBPCG, you are applying generic minimization to x'Hx with a normalization constraint, i.e. you are effectively minimizing the Rayleigh quotient. Worse, you are not even using a gradient-based optimization method — optimize defaults to Nelder–Mead, which is a derivative-free algorithm and is truly awful in high dimensions. Even if you switched to a gradient-based method (by computing the Fréchet-derivative of your functional analytically), it still won’t be as good as a Krylov-based algorithm specialized for eigenproblems (e.g. LOBPCG is essentially a specialized Rayleigh–quotient minimization).

Even if V is an arbitrary nonlinear function as in your example, you will still want to use a gradient-based optimization algorithm to make this practical in high dimensions. (Fortunately, the Fréchet derivative (i.e. the functional derivative) is pretty easy to write down analytically in your case.)

3 Likes

Thank you for the helpful replies! As I said in my post, I am fairly new to Julia, and I haven’t used optimization techniques before, so apologies for my terrible methods.

## Rewriting the integral

Your first suggestion of defining the function in a more efficient way, and using sum instead of QuadGK already improves things quite a lot:

V(x, ϕ, k) = k * cos(2π * (x + ϕ))

function objective(space, vec, k)
ϕ = Fun(space, [0.0; vec])
dϕ = ϕ'
return sum(Fun(x -> 0.5 * dϕ(x)^2 + V(x, ϕ(x), k), space))
end

@time optimize(v->objective(s, v, 1.0),zeros(20))


14.220752 seconds (206.55 M allocations: 8.488 GiB, 11.37% gc time, 0.05% compilation time)

• Status: success

• Candidate solution
Final objective value: -2.174133e-01

• Found with

• Convergence measures
√(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

• Work counters
Seconds run: 14 (vs limit Inf)
Iterations: 996
f(x) calls: 1529

An speedup in time / reduction in memory allocation by a factor of ~ 3. It still seems like a lot of allocations, but this could be due to a poor method as you mentioned.

Just checking the indivual parts of the integral:

function objective1(space, vec, k) # just try integrating ½(∂ϕ)²
ϕ = Fun(space, [0.0; vec])
dϕ = ϕ'
return sum(Fun(x -> 0.5 * dϕ(x)^2, space))
end

function objective2(space, vec, k) # just integrate the potential
ϕ = Fun(space, [0.0; vec])
return sum(Fun(x -> V(x, ϕ(x), k), space))
end

@time objective(s,[0.0,1.0],1.0)
@time objective1(s,[0.0,1.0],1.0)
@time objective2(s,[0.0,1.0],1.0)


0.001235 seconds (14.70 k allocations: 624.594 KiB)
0.000227 seconds (1.82 k allocations: 78.297 KiB)
0.000964 seconds (8.87 k allocations: 380.750 KiB)

It seems the potential is resulting in a lot of memory allocations.

## Changing to a gradient-based method

I switched from the default method to LBFGS:

@time optimize(v->objective(s, v, 1.0),zeros(20),LBFGS())


8.262182 seconds (119.96 M allocations: 4.936 GiB, 11.28% gc time, 1.21% compilation time)

• Status: success

• Candidate solution
Final objective value: -2.174133e-01

• Found with
Algorithm: L-BFGS

• Convergence measures
|x - x’| = 9.39e-10 ≰ 0.0e+00
|x - x’|/|x’| = 7.33e-09 ≰ 0.0e+00
|f(x) - f(x’)| = 5.55e-17 ≰ 0.0e+00
|f(x) - f(x’)|/|f(x’)| = 2.55e-16 ≰ 0.0e+00
|g(x)| = 5.11e-10 ≤ 1.0e-08

• Work counters
Seconds run: 8 (vs limit Inf)
Iterations: 13
f(x) calls: 34
∇f(x) calls: 34

It makes far fewer iterations and f(x) calls, but still seems like a lot of memory allocations.

For including the derivative analytically, the gradient of the function with respect to \{\phi_i\} is \nabla_{\{\phi_i\}} F = \int_0^1 \left[-\partial_x^2\phi(x) - 2\pi k \sin{2\pi (x + \phi(x))}\right]\nabla_{\{\phi_i\}} \phi(x) \text{d} x . The integral can be evaluated as before. \nabla_{\{\phi_n\} } \phi(x) would be a vector with entries

Fun(space, [0, .... , 1, ...., 0])
#ith position


I tried, following the Optim documentation for including analytic derivatives:

using LinearAlgebra

dV(x, ϕ, k) = -2π*k*sin(2π*(x + ϕ))

ϕ = Fun(s, [0; vec])
d2ϕ = ϕ''
mat = Matrix(1.0*I,length(vec),length(vec)) # identiy matrix for partial_i phi
#    G = zeros(length(vec))
for i in 1:length(vec)
G[i] = sum(Fun(x -> (-d2ϕ(x) + dV(x, ϕ(x), 1.0))*Fun(s,[[0.0];mat[i,:]]).(x), s))
end
end


But when I run e.g.

@time optimize(v->objective(s, v, 1.0), gradientF!, zeros(20), LBFGS())


it just hangs.

Thanks again for your help

Could Surrogates.jl be useful here?

The gradient should just be the ApproxFun coefficients (Fourier components) of \left[-\partial_x^2\phi(x) - 2\pi k \sin{2\pi (x + \phi(x))}\right], I think — no loop of integrals required.

(Note that it is always a good idea to test the correctness of your gradient by comparing it to finite differences before running any optimization.)

I was able to improve the objective function further by taking \text{d}\phi outside of the Fun() call:

V(x::Float64, ϕ::Float64, k::Float64) = k * cos(2π * (x + ϕ))

function objective(space::T where T <: ApproxFun.SumSpace, vec::Vector{Float64}, k::Float64)
ϕ = Fun(space, [0.0;vec])
dϕ = ϕ'
return sum(0.5*dϕ^2 + Fun(x -> V(x, ϕ(x), k), space))
end

@time optimize(v->objective(s, v, 1.0),zeros(20), LBFGS())


4.835670 seconds (64.62 M allocations: 2.754 GiB, 9.39% gc time, 1.62% compilation time)

This is a factor of 10 faster than my original attempt. I think the expensive part is V(x,\phi(x)) being an anonymous function, but I don’t know if this can be avoided.

Yes, I think you are right. I tried to implement this as follows:

dV(x::Float64, ϕ::Float64, k::Float64) = -2π*k*sin(2π*(x + ϕ))

function g!(G::Vector{Float64},vec::Vector{Float64})
s=Fourier(0..1)
k = 1.0
ϕ = Fun(s, [0; vec])
d2ϕ = ϕ''
G = resize!(coefficients( -1.0*d2ϕ + Fun(x -> dV(x, ϕ(x), k),s)), length(vec))
# need G and vec to have same length
end


I followed the Optim documentation and tried to define the gradient using only G to store the components of \nabla_{\{\phi_i\}}F and vec which contains \{\phi_i\}. I also used resize! to make sure that G and vec have the same length because Fun() will automatically select the number of coefficients it contains.

@time res=optimize(vec->objective(s, vec, 1.0), g!, zeros(20), LBFGS())


0.014414 seconds (15.99 k allocations: 761.576 KiB, 48.27% compilation time)

• Status: success

• Candidate solution
Final objective value: -5.622630e-17

• Found with
Algorithm: L-BFGS

• Convergence measures
|x - x’| = 0.00e+00 ≤ 0.0e+00
|x - x’|/|x’| = NaN ≰ 0.0e+00
|f(x) - f(x’)| = NaN ≰ 0.0e+00
|f(x) - f(x’)|/|f(x’)| = NaN ≰ 0.0e+00
|g(x)| = NaN ≰ 1.0e-08

• Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 0
f(x) calls: 1
∇f(x) calls: 1

It stops after a single step…

Did you check your gradient against a finite-difference approximation (e.g. using FiniteDifferences.jl)?

After checking with FiniteDifferences, it seems my gradient was off by a factor of 1/2. Also, I was including the zeroth coefficient, which was included when I called coefficient(...). I fixed both of those:

function g!(G::Vector{Float64},vec::Vector{Float64})
s=Fourier(0..1)
k = 1.0
ϕ = Fun(s, [0; vec])
d2ϕ = ϕ''
G = 0.5*resize!(deleteat!(coefficients( -1.0*d2ϕ + Fun(x -> dV(x, ϕ(x), k),s)),1), length(vec)) # remove first entry with deleteat!
end


And the answer matches well with what I obtain by inserting objective into finite differences. However, when I try to use this in optimize() it still stops on the first iteration