How to improve the performance of a function generating a Jacobian with "Toeplitz" structure

I have a single variable function g(x) where x(t) is a periodic function. I am interested in obtaining the jacobian of the Fourier harmonics of g when differentiated with respect to the harmonics of x(t).

It is well known that, when working in complex variables, this jacobian has a Toeplitz structure. However, I am interested in obtaining the jacobian when working with the real Fourier coefficients. The following function exploits the structure of the jacobian, where only the first column is computed by finite differences and then the rest of the jacobian follows from it.

Nevertheless, the performance gain from this function is very minor when compared to computing the whole jacobian entirely by finite differences. What could I improve to increase the speed?

function toeplitz_jacobian(f, x)
    n = length(x)
    h = 10^-8

    x_pert = x + vcat(h, zeros(n - 1))
    f_x = f(x)
    f_x_pert = f(x_pert)

    # Compute first column of the jacobian
    ∂f∂x⁰ = (f_x_pert - f_x) / h

    J = zeros(n, n)

    # Exploit toeplitz structure to generate the rest of the jacobian
    @views begin
        J[:, 1] .= ∂f∂x⁰[1:n]
        J[1, 2:end] .= ∂f∂x⁰[2:n] .* 2
        ∂f∂x⁰ = vcat(∂f∂x⁰[1], 0, ∂f∂x⁰[2:end])

    @views for k in 2:2:(n - 1)
        for j in k:2:(n - 1)
            J[k, j] = ∂f∂x⁰[j - k + 1] + ∂f∂x⁰[j + k + 1]
            J[k + 1, j + 1] = ∂f∂x⁰[j - k + 1] - ∂f∂x⁰[j + k + 1]
            J[k, j + 1] = ∂f∂x⁰[j + k + 2] + ∂f∂x⁰[j - k + 2]
            J[k + 1, j] = ∂f∂x⁰[j + k + 2] - ∂f∂x⁰[j - k + 2]

            if j > k
                J[j, k] = J[k, j]
                J[j + 1, k + 1] = J[k + 1, j + 1]
                J[j, k + 1] = J[k + 1, j]
                J[j + 1, k] = J[k, j + 1]
    return J

My first step would be to run it using @profview so that I could see where the time is being spent—presumably, not on f evaluations.

If you are calling it many times, one suspect is the allocation of J.


If you don’t want to bother with manual jacobians, you can take a look at

Without running/profiling your code, I think I can see some quick performance wins though. Most are related to array allocations, as @tobydriscoll alluded to: it can be a lot faster to pre-allocate memory and reuse it with in-place operations.
Some examples:

This allocates 3 arrays (one for zeros, one for the vcat and one for the +). If x_pert was already allocated, then

x_pert[1:end-n+1] .= x .+ h
x_pert[end-n+1:end] .= x

would be allocation-free.

These allocate one array each. If you could rewrite f so that it overwrites a provided output array, i.e. f!(y, x), it would make function calls more efficient.

This allocates two arrays. If ∂f∂x⁰ was already allocated, then

∂f∂x⁰ .= (f_x_pert .- f_x) ./ h

would be allocation-free.

This might be one of the biggest culprits indeed, cause the previous remarks are related to vector allocations but this is a full matrix. If you know it is sparse, you should probably consider building a sparse matrix?
Note that individual coefficient updates are very slow for fast matrices. So if you go down that road, you should probably use the sparse constructor.

Despite the @views, this allocates one array for the vcat.

Again, I’m not sure which one of those tweaks (if any) can improve performance significantly, but they are all good reflexes to keep in mind!

Thank you very much for the suggestions! I will keep then in mind to try to avoid some allocations.

However, note that the jacobian is not sparse, is completely dense, is just it can be easily reconstructed just from the knowledge of the first column. So the sparse constructors are not useful in this case.

My bad, in my mind Toeplitz sounded like tridiagonal and the like, suggesting sparsity, but I now remember what it is ^^

You should also check out the ToeplitzMatrices package, which stores only the non redundant part.