How to improve the performance of a function generating a Jacobian with "Toeplitz" structure

RayleighLord · March 29, 2024, 12:51pm

I have a single variable function g(x) where x(t) is a periodic function. I am interested in obtaining the jacobian of the Fourier harmonics of g when differentiated with respect to the harmonics of x(t).

It is well known that, when working in complex variables, this jacobian has a Toeplitz structure. However, I am interested in obtaining the jacobian when working with the real Fourier coefficients. The following function exploits the structure of the jacobian, where only the first column is computed by finite differences and then the rest of the jacobian follows from it.

Nevertheless, the performance gain from this function is very minor when compared to computing the whole jacobian entirely by finite differences. What could I improve to increase the speed?

function toeplitz_jacobian(f, x)
    n = length(x)
    h = 10^-8

    x_pert = x + vcat(h, zeros(n - 1))
    f_x = f(x)
    f_x_pert = f(x_pert)

    # Compute first column of the jacobian
    ∂f∂x⁰ = (f_x_pert - f_x) / h

    J = zeros(n, n)

    # Exploit toeplitz structure to generate the rest of the jacobian
    @views begin
        J[:, 1] .= ∂f∂x⁰[1:n]
        J[1, 2:end] .= ∂f∂x⁰[2:n] .* 2
        ∂f∂x⁰ = vcat(∂f∂x⁰[1], 0, ∂f∂x⁰[2:end])
    end

    @views for k in 2:2:(n - 1)
        for j in k:2:(n - 1)
            J[k, j] = ∂f∂x⁰[j - k + 1] + ∂f∂x⁰[j + k + 1]
            J[k + 1, j + 1] = ∂f∂x⁰[j - k + 1] - ∂f∂x⁰[j + k + 1]
            J[k, j + 1] = ∂f∂x⁰[j + k + 2] + ∂f∂x⁰[j - k + 2]
            J[k + 1, j] = ∂f∂x⁰[j + k + 2] - ∂f∂x⁰[j - k + 2]

            if j > k
                J[j, k] = J[k, j]
                J[j + 1, k + 1] = J[k + 1, j + 1]
                J[j, k + 1] = J[k + 1, j]
                J[j + 1, k] = J[k, j + 1]
            end
        end
    end
    return J
end

tobydriscoll · March 29, 2024, 4:44pm

My first step would be to run it using @profview so that I could see where the time is being spent—presumably, not on f evaluations.

If you are calling it many times, one suspect is the allocation of J.

gdalle · March 30, 2024, 7:30am

If you don’t want to bother with manual jacobians, you can take a look at

Without running/profiling your code, I think I can see some quick performance wins though. Most are related to array allocations, as @tobydriscoll alluded to: it can be a lot faster to pre-allocate memory and reuse it with in-place operations.
Some examples:

This allocates 3 arrays (one for zeros, one for the vcat and one for the +). If x_pert was already allocated, then

x_pert[1:end-n+1] .= x .+ h
x_pert[end-n+1:end] .= x

would be allocation-free.

These allocate one array each. If you could rewrite f so that it overwrites a provided output array, i.e. f!(y, x), it would make function calls more efficient.

This allocates two arrays. If ∂f∂x⁰ was already allocated, then

∂f∂x⁰ .= (f_x_pert .- f_x) ./ h

would be allocation-free.

This might be one of the biggest culprits indeed, cause the previous remarks are related to vector allocations but this is a full matrix. If you know it is sparse, you should probably consider building a sparse matrix?
Note that individual coefficient updates are very slow for fast matrices. So if you go down that road, you should probably use the sparse constructor.

Despite the @views, this allocates one array for the vcat.

Again, I’m not sure which one of those tweaks (if any) can improve performance significantly, but they are all good reflexes to keep in mind!

RayleighLord · March 30, 2024, 11:19am

Thank you very much for the suggestions! I will keep then in mind to try to avoid some allocations.

However, note that the jacobian is not sparse, is completely dense, is just it can be easily reconstructed just from the knowledge of the first column. So the sparse constructors are not useful in this case.

gdalle · March 30, 2024, 1:42pm

My bad, in my mind Toeplitz sounded like tridiagonal and the like, suggesting sparsity, but I now remember what it is ^^

PeterSimon · March 30, 2024, 2:41pm

You should also check out the ToeplitzMatrices package, which stores only the non redundant part.

Topic		Replies	Views
How to optimize toeplitz jacobian function Performance question	0	143	February 6, 2024
Help in accelerating constructing Symbolic jacobian? Performance performance , symbolics , symbolicutils	9	470	October 10, 2023
toeplitz matrix New to Julia	20	4597	April 16, 2023
Optimizing a Spectral Proper Orthogonal Decomposition Method Performance	19	686	May 6, 2022
Compute fourrier transform in Julia Numerics question	21	3650	June 5, 2023

How to improve the performance of a function generating a Jacobian with "Toeplitz" structure

Related topics