Help with Automatic Differentiation of Stiffness Matrix Function

I am trying to compute the gradient of a finite element stiffness matrix with respect to nodal coordinates, Young’s modulus E, and thickness t. I am new to automatic differentiation (AD) and have tried using DifferentiationInterface.jl with ForwardDiff and Enzyme as backends.

I also attempted changing nodes_cord from a matrix to a vector, but I still encountered issues. When I compute only the area A of the triangle, differentiation works. However, once I include the computation of the strain-displacement matrix B, AD fails.

In the future, I will need to compute the gradient of the global assembled stiffness matrix (i.e., after assembling element stiffness matrices into a larger system). Given that, I want to ensure my approach scales properly.

Any advice on how to correctly differentiate this function would be greatly appreciated!

Here is the function I am working with:

function compute_CST_stiffness(nodes_cord::Matrix{Float64}, E::Number, ν::Number, t::Number)
    x1, y1 = nodes_cord[1,:]
    x2, y2 = nodes_cord[2,:]
    x3, y3 = nodes_cord[3,:]

    # Compute area of the triangle
    A = 0.5 * abs(x1*(y2 - y3) + x2*(y3 - y1) + x3*(y1 - y2))

    # Shape function derivatives
    β = [y2 - y3, y3 - y1, y1 - y2] / (2A)
    γ = [x3 - x2, x1 - x3, x2 - x1] / (2A)

    # Plane stress elasticity matrix
    D = (E / (1 - ν^2)) * [1 ν 0; ν 1 0; 0 0 (1 - ν)/2]

    # Strain-displacement matrix B
    B = zeros(3, 6)
    for i in 1:3
        B[1, 2i-1] = β[i]
        B[2, 2i] = γ[i]
        B[3, 2i-1] = γ[i]
        B[3, 2i] = β[i]
    end

    # Compute stiffness matrix
    K = t * A * (B' * D * B)
    return K
end

Has anyone encountered similar issues when differentiating matrix-valued functions?
Would a different approach make differentiation more reliable, especially when working toward computing the gradient of the global assembled stiffness matrix?

Thanks in advance!

This seems to work:

function c(p::Matrix{T}) where T
    nodes_cord = p[1:3, 1:2]
    E = p[1, 3]
    ν = p[2, 3]
    t = p[3, 3]
    x1, y1 = nodes_cord[1,:]
    x2, y2 = nodes_cord[2,:]
    x3, y3 = nodes_cord[3,:]

    # Compute area of the triangle
    A = 0.5 * abs(x1*(y2 - y3) + x2*(y3 - y1) + x3*(y1 - y2))

    # Shape function derivatives
    β = [y2 - y3, y3 - y1, y1 - y2] / (2A)
    γ = [x3 - x2, x1 - x3, x2 - x1] / (2A)
    
    # Plane stress elasticity matrix
    D = (E / (1 - ν^2)) * [1 ν 0; ν 1 0; 0 0 (1 - ν)/2]

    # Strain-displacement matrix B
    B = zeros(T, 3, 6)
    for i in 1:3
        B[1, 2i-1] = β[i]
        B[2, 2i] = γ[i]
        B[3, 2i-1] = γ[i]
        B[3, 2i] = β[i]
    end
    
    # Compute stiffness matrix
    K = t * A * (B' * D * B)
    return K
end

p = rand(3, 3)
J = ForwardDiff.jacobian(c, p)
# 36×9 Matrix{Float64}:
reshape(J, 6, 6, 3, 3)  # output dimension × input dimension

So here we

  • use ForwardDiff.jacobian since gradient only applies if the output of the function is a scalar,
  • pack all the function arguments into a single Array (you could also use a Vector if you want), to conform with the method ForwardDiff.jacobian(f, x::AbstractArray, cfg::JacobianConfig = JacobianConfig(f, x), check=Val{true}()),
  • made the type generic. As far as I understand ForwardDiff works by running the function again, but with different types (Dual), with overloaded operations to also compute the derivatives. This means your function needs to be written sufficiently generically. For starters, don’t explicitly hardcode Float64 in the type of nodes_cord. But also B = zeros(3, 6) (implicitly) means B = zeros(Float64, 3, 6), so we needed to correct that.

I thought you wanted something that scales? Forward-mode differentiation doesn’t scale to lots of parameters (and differentiating with respect to “nodal coordinates” certainly qualifies as “lots”).

What is the gradient of matrix? You can talk about the derivative of a matrix (the linear operator that gives the change in the matrix from a small change in the inputs, sometimes called the “Jacobian”, especially if you “flatten” the inputs and outputs into vectors / pick a basis), but in my mind “gradient” is reserved for scalar-valued functions: the gradient is the thing you take the inner product of a small change in the input with to get the small change in a scalar output. (This also makes the gradient the “uphill” direction in the corresponding norm, hence gradient ascent etc.)

Moreover, if you want a method that scales to lots of input parameters, you should really be looking for gradients of scalar-valued functions. For example, if you take your stiffness matrix, solve a linear system, and then compute a scalar value from the solution (e.g. a “loss” function for optimization), then you can compute the gradient of that final loss function with respect to your inputs efficiently (via “reverse” mode or “adjoint” methods). But, crucially, you do so without ever explicitly forming the derivative (or “Jacobian”) of the matrix itself — avoiding this is essential to have a scalable method.

2 Likes

Thank you very much for your help, and sorry for the slow response.

I see your point regarding the efficiency of reverse-mode differentiation for scalar-valued functions. However, I am following this paper, where they compute terms of the form:

\lambda^T \frac{dK}{d\zeta}u

where:

  • K is the stiffness matrix,
  • u is the displacement vector,
  • lambda is an adjoint vector,
  • \zeta is the parameter we differentiate with respect to (e.g., thickness, material properties, or nodal coordinates).

In the paper, they use symbolic differentiation in MATLAB to compute analytical expressions for the derivatives of K with respect to each parameter separately (e.g., \frac{dK}{dx_1}, \frac{dK}{dy_1}, \dots).

I’d like to replace this symbolic differentiation process with automatic differentiation (AD) in Julia, hoping it will be more flexible when I later introduce additional parameters or constraints.

Does this approach make sense? Would I need to differentiate K separately for each parameter, or is there a way to obtain all derivatives in one go efficiently?

Thanks again for your insights!

This form is equivalent to a vector–Jacobian product, which is what backpropagation (reverse mode) does for you.

I’ve used expressions of the form \lambda^T \frac{\partial K}{\partial \zeta} u many times because they show up in adjoint methods for PDE solves (see also my course notes on adjoint methods and chapter 6 of our matrix calculus course notes), and my group does a lot of PDE-constrained shape and topology optimization. If you implement this explicitly, by separately computing \lambda and v yourself, it is absolutely critical to exploit the structure of \frac{\partial K}{\partial \zeta} — in particular, for finite-element methods, \frac{\partial K}{\partial \zeta} is usually extremely sparse (even more sparse than K: I’m talking O(1) nonzeros), since most parameters affect only a few elements — this lets it scale to a vast number of parameters. If you just throw AD at the problem blindly, you run the danger of computing a huge matrix (for each \zeta!) that is mostly zero, where you computed and stored many zeros explicitly.

It is still possible to apply AD, but you need to do so carefully, e.g. apply forward-mode AD to the weak-form integrand of your FEM problem to get the derivative of the weak form, and then plug this into your matrix-assembly routines to do all of the \lambda^T K' u products at once (similar to this tutorial). Or apply reverse-mode to the whole problem at once rather than doing manual adjoint solves (this assumes that reverse-mode AD can differentiate effectively through your whole problem, which is unfortunately often still not possible for PDE solvers).

Basically, you have to think about how AD works and think about scalings and structure (e.g. sparsity) if you want to use it effectively in a large problem, especially if you are doing part of the differentiation manually (e.g. manual adjoint solves).

3 Likes

Thank you very much for the help.