Help with Automatic Differentiation of Stiffness Matrix Function

I am trying to compute the gradient of a finite element stiffness matrix with respect to nodal coordinates, Young’s modulus E, and thickness t. I am new to automatic differentiation (AD) and have tried using DifferentiationInterface.jl with ForwardDiff and Enzyme as backends.

I also attempted changing nodes_cord from a matrix to a vector, but I still encountered issues. When I compute only the area A of the triangle, differentiation works. However, once I include the computation of the strain-displacement matrix B, AD fails.

In the future, I will need to compute the gradient of the global assembled stiffness matrix (i.e., after assembling element stiffness matrices into a larger system). Given that, I want to ensure my approach scales properly.

Any advice on how to correctly differentiate this function would be greatly appreciated!

Here is the function I am working with:

function compute_CST_stiffness(nodes_cord::Matrix{Float64}, E::Number, ν::Number, t::Number)
    x1, y1 = nodes_cord[1,:]
    x2, y2 = nodes_cord[2,:]
    x3, y3 = nodes_cord[3,:]

    # Compute area of the triangle
    A = 0.5 * abs(x1*(y2 - y3) + x2*(y3 - y1) + x3*(y1 - y2))

    # Shape function derivatives
    β = [y2 - y3, y3 - y1, y1 - y2] / (2A)
    γ = [x3 - x2, x1 - x3, x2 - x1] / (2A)

    # Plane stress elasticity matrix
    D = (E / (1 - ν^2)) * [1 ν 0; ν 1 0; 0 0 (1 - ν)/2]

    # Strain-displacement matrix B
    B = zeros(3, 6)
    for i in 1:3
        B[1, 2i-1] = β[i]
        B[2, 2i] = γ[i]
        B[3, 2i-1] = γ[i]
        B[3, 2i] = β[i]
    end

    # Compute stiffness matrix
    K = t * A * (B' * D * B)
    return K
end

Has anyone encountered similar issues when differentiating matrix-valued functions?
Would a different approach make differentiation more reliable, especially when working toward computing the gradient of the global assembled stiffness matrix?

Thanks in advance!

This seem to work:

function c(p::Matrix{T}) where T
    nodes_cord = p[1:3, 1:2]
    E = p[1, 3]
    ν = p[2, 3]
    t = p[3, 3]
    x1, y1 = nodes_cord[1,:]
    x2, y2 = nodes_cord[2,:]
    x3, y3 = nodes_cord[3,:]

    # Compute area of the triangle
    A = 0.5 * abs(x1*(y2 - y3) + x2*(y3 - y1) + x3*(y1 - y2))

    # Shape function derivatives
    β = [y2 - y3, y3 - y1, y1 - y2] / (2A)
    γ = [x3 - x2, x1 - x3, x2 - x1] / (2A)
    
    # Plane stress elasticity matrix
    D = (E / (1 - ν^2)) * [1 ν 0; ν 1 0; 0 0 (1 - ν)/2]

    # Strain-displacement matrix B
    B = zeros(T, 3, 6)
    for i in 1:3
        B[1, 2i-1] = β[i]
        B[2, 2i] = γ[i]
        B[3, 2i-1] = γ[i]
        B[3, 2i] = β[i]
    end
    
    # Compute stiffness matrix
    K = t * A * (B' * D * B)
    return K
end

p = rand(3, 3)
J = ForwardDiff.jacobian(c, p)
# 36×9 Matrix{Float64}:
reshape(J, 6, 6, 3, 3)  # output dimension × input dimension

So here we

  • use ForwardDiff.jacobian since gradient only applies if the output of the function is a scalar,
  • pack all the function arguments into a single Array (you could also use a Vector if you want), to conform with the method ForwardDiff.jacobian(f, x::AbstractArray, cfg::JacobianConfig = JacobianConfig(f, x), check=Val{true}()),
  • made the type generic. As far as I understand ForwardDiff works by running the function again, but with different types (Dual), with overloaded operations to also compute the derivatives. This means your function needs to be written sufficiently generically. For starters, don’t explicitly hardcode Float64 in the type of nodes_cord. But also B = zeros(3, 6) (implicitly) means B = zeros(Float64, 3, 6), so we needed to correct that.

I thought you wanted something that scales? Forward-mode differentiation doesn’t scale to lots of parameters (and differentiating with respect to “nodal coordinates” certainly qualifies as “lots”).

What is the gradient of matrix? You can talk about the derivative of a matrix (the linear operator that gives the change in the matrix from a small change in the inputs, sometimes called the “Jacobian”, especially if you “flatten” the inputs and outputs into vectors / pick a basis), but in my mind “gradient” is reserved for scalar-valued functions: the gradient is the thing you take the inner product of a small change in the input with to get the small change in a scalar output. (This also makes the gradient the “uphill” direction in the corresponding norm, hence gradient ascent etc.)

Moreover, if you want a method that scales to lots of input parameters, you should really be looking for gradients of scalar-valued functions. For example, if you take your stiffness matrix, solve a linear system, and then compute a scalar value from the solution (e.g. a “loss” function for optimization), then you can compute the gradient of that final loss function with respect to your inputs efficiently (via “reverse” mode or “adjoint” methods). But, crucially, you do so without ever explicitly forming the derivative (or “Jacobian”) of the matrix itself — avoiding this is essential to have a scalable method.

1 Like