Preserving ComponentArray structure with DifferentiationInterface

I’m building a DSGE modeling language (lol what else is new) where I want to use something like ComponentArrays.jl for the construction of first order perturbations using DifferentiationInterface.jl. The point here is that I divide a model into blocks with unknowns and targets ala sequence_jacobian.py.

The problem here is that the component array loses its structure when performing DI.jacobian(), but not with DI.gradient(). I can produce a (not so) minimum working example with the following.

using ComponentArrays
using DifferentiationInterface
import ForwardDiff

lag(x::AbstractVector) = x[1]
lag(x::Real) = x

contemp(x::AbstractVector) = x[2]
contemp(x::Real) = x

lead(x::AbstractVector) = x[3]
lead(x::Real) = x

function production(x::ComponentArray)
    (; K, L, Z, α, δ) = x
    r = contemp(α) * contemp(Z) * (lag(K) / contemp(L)) ^ (contemp(α) - 1) - contemp(δ)
    w = (1 - contemp(α)) * contemp(Z) * (lag(K) / contemp(L)) ^ contemp(α)
    Y = contemp(Z) * lag(K) ^ contemp(α) * contemp(L) ^ (contemp(α) - 1)
    return ComponentArray(; r, w, Y)
end

Suppose we already solved for the steady state using NonlinearSolve.jl using x -> production(x) as the objective, which gave us

ss = ComponentArray(
    L = 1.0, δ = 0.025, α = 0.11, K = 3.142857142857058, Z = 0.881646097521458
)

Finding the first order perturbation then reduces to the following calculus

stacked_ss = ComponentArray(NamedTuple{keys(ss)}([fill(i, 3) for i in ss]))
J = DifferentiationInterface.jacobian(
    x -> production(x), AutoForwardDiff(), stacked_ss
)

which yields the following illegible mess:

3×15 Matrix{Float64}:
 -0.00991  0.0  0.0  0.0   0.03115  0.0  0.0  0.03969  0.0  0.0  0.35826  0.0  0.0  -1.0  0.0
  0.03115  0.0  0.0  0.0  -0.09790  0.0  0.0  1.00948  0.0  0.0  0.01917  0.0  0.0   0.0  0.0
  0.03500  0.0  0.0  0.0  -0.89000  0.0  0.0  1.13424  0.0  0.0  1.14513  0.0  0.0   0.0  0.0

However, when I individually take the gradient per target…

DifferentiationInterface.gradient(
    x -> production(x)[:r], AutoForwardDiff(), stacked_ss
)

the component array structure is preserved!

ComponentVector{Float64}(
  L = [0.0, 0.031150000000000788, 0.0],
  δ = [0.0, -1.0, 0.0],
  α = [0.0, 0.35826144883243144, 0.0],
  K = [-0.009911363636364154, 0.0, 0.0],
  Z = [0.0, 0.0396984686921376, 0.0]
)

As much as I enjoy the type stability and composability of ComponentArrays.jl, I feel like I have to do more work to step away from the intuitiveness of named tuples. And if I have to reshape and manipulate every output from the function call, then I may as well create my own struct specific to this problem.

There are a couple hills I think are worth dying on here:

  1. DI.jacobian is the clear winner here since we can prep the gradient upon block definition and store sparsity patterns.
  2. I need to preserve the names of rows/columns of the Jacobian for computing chain rules when faced with multiple blocks. Whether this be ComponentArrays.jl, NamedTuples, or some custom object.

For those who are curious, I copied the design philosophy of sequence_jacobian.py and create the above model block with a macro.

@simple function production(K, L, Z, α, δ)
    r = α * Z * (lag(K) / L) ^ (α - 1) - δ
    w = (1 - α) * Z * (lag(K) / L) ^ α
    Y = Z * lag(K) ^ α * L ^ (α - 1)
    return r, w, Y
end

Hi, glad that you find DI useful! A few things are worth noting:

  • The behavior you describe is not due to DI, it comes from the underlying autodiff backend.
julia> using ComponentArrays, Enzyme, ForwardDiff, Zygote

julia> f(x::ComponentVector) = ComponentVector(c=2 .* x.a, d=3 .* x.b)
f (generic function with 1 method)

julia> x = ComponentVector(a=[1.0, 2.0], b=[3.0, 4.0])
ComponentVector{Float64}(a = [1.0, 2.0], b = [3.0, 4.0])

julia> y = f(x)
ComponentVector{Float64}(c = [2.0, 4.0], d = [9.0, 12.0])

julia> ForwardDiff.jacobian(f, x)
4×4 Matrix{Float64}:
 2.0  0.0  0.0  0.0
 0.0  2.0  0.0  0.0
 0.0  0.0  3.0  0.0
 0.0  0.0  0.0  3.0

julia> Zygote.jacobian(f, x)[1]
4×4 Matrix{Float64}:
 2.0  0.0  0.0  0.0
 0.0  2.0  0.0  0.0
 0.0  0.0  3.0  0.0
 0.0  0.0  0.0  3.0
  • In general, it is expected for gradient to preserve the structure and for jacobian to forget it. A gradient lives in the same space as the original input, whereas a Jacobian lives in a kind of product space. Even in the simple case of a matrix-to-matrix function, most autodiff backends (except Enzyme I think) will flatten it and represent its Jacobian as a 2d matrix instead of a 4d tensor.

  • Theoretically we could want a ComponentMatrix as the result but given how autodiff works, the Jacobian is usually either (1) a horizontal stacking of output-like vectors or (2) a vertical stacking of input-like vectors. It is difficult to take both input and output structure into account at the same time. Also, ComponentMatrix was kind of buggy last time I checked.

  • If you want to use sparse autodiff with DI, then you shouldn’t want a ComponentMatrix anyway, but a SparseMatrixCSC or some kind of block matrix, e.g. with BlockArrays.jl?

  • What do you mean by “preserve the names for computing chain rules”? What is the point of autodiff if you compute chain rules manually?

1 Like

Ahh yes, that makes quite a bit of sense in this case. I guess I may need to reshape the output since it at least follows a consistent structure.

Another paradigm of writing macro models is based on a DAG, where the various blocks (nodes) represent a group of model equations. You can reduce the dimension of the problem when computing block Jacobians and chaining ruling them together (as opposed to putting everything in one block). For large models with heterogeneous agents, solving models is quite a bit faster under this approach. This seminal macro paper illustrates this concept further.

This is definitely the proper approach, and one I’m likely to take in the future.

Thanks for your work on DI btw, it has been an amazing addition to my everyday carry.