Some of this was touched in the issues/related docs, but I’ll give some more sparse details here…which if anyone want to help make into docs, would be fantastic.
We actually talked about this sort of thing a few weeks ago in the Enzyme weekly open meetings (also shameless plug to attend EnzymeCon next week at MIT: Enzyme Conference 2024)
Essentially the way Enzyme represents all data structures, arrays, linked lists, trees, and as relevant to this discussion, sparse data structures, is to have the shadow (aka derivative) memory be the same memory layout as the primal. Suppose you have an input data structure x. The derivative of x at byte offset 12 will be stored in the shadow dx at byte offset 12, etc.
This has the nice property that the storage for the derivative, including all intermediate computations, is the same as that of the primal (ignoring caching requirements for reverse mode).
It also means that any arbitrary data structure can be differentiated with respect to, and we don’t have any special handling required to register every data structure one could create.
This representation does have some caveats (e.g. see Why does `Duplicated(x, dx)` assume `x` and `dx` have the same type? · Issue #1329 · EnzymeAD/Enzyme.jl · GitHub and Supporting covariant derivatives · Issue #1334 · EnzymeAD/Enzyme.jl · GitHub and Home · Enzyme.jl for relevant deep dives), but I’ll discuss the relevant parts to sparsity below.
Sparse data structures are often (and I believe in Julia) represented with say a Vector{Float64} that holds the actual elements, and a Vector{Int} that specifies the index n the backing array that corresponds to the true location in the overall vector.
We have no explicit special cases for Sparse Data structures, so the layout semantics mentioned above is indeed what Enzyme uses.
Thus the derivative of a sparse array is to have a second backing array of the same size, and another Vector{Int} (of the same offsets).
As a concrete example, suppose we have the following:
x = { 3 : 2.7, 10 : 3.14 }
, which is to say a sparse data structure with two elements, one at index 3, another at index 10. This could be represented with the backing array being [2.7, 3.14]
and the index array being [3, 10]
A correctly zero-initialized shadow data structure would be to have a backing array of size 2 with zero’s, and an index array again being [3, 10]
.
In this form the second element of the derivative backing array is used to store/represent the derivative of the second element of the original backing array, in other words the derivative at index 10.
A caveat here (discussed in our FAQs here: Home · Enzyme.jl) is that this correctly zero’d initializer is not the default produced by sparse(0). Instead we provide and generally recommend using Enzyme.make_zero
which recursively goes through your data structure to generate the shadows of the correct structure (and in this case would make a new backing array). Again the make_zero function is not special cased to sparsity at all, but just comes out as a result.
Internally, when differentiating a function this is the type of data structure that Enzyme builds and uses to represent variables. It is at the julia level that there’s a bit of a sharp edge.
@stevengj in your example f(A(x))
where A returns a sparse data structure. Enzyme’s differentiable version of A that it generates would create both the backing/index arrays for the original result A, as well as the equal sized backing/index arrays for the derivative.
For any program which generates sparse data structures internally, this will always give you the answer you expect – and with the corresponding memory requirements outlined above.
The added caveat, however, is if you want to differentiate a top level function that takes in a sparse array. For example consider the sum
function over all elements. While in one semantic meaning it is meant to represent summing up all elements of the virtual sparse array, in a more literal sense the sum will only add elements 3 and 10 of the input sparse array – the only two nonzero elements – or equivalently the sum of the whole backing array. Correspondingly Enzyme will update the sparse shadow data structure to mark both elements 3 and 10 as having a derivative of 1 (or more literally set all the elements of the backing array to derivative 1). These are the only variables that Enzyme needs to update, since quite literarily they are the only variables read (and thus have a non-zero derivative). This is why this memory-safe representation composes within Enzyme.
If the name we gave to this data structure wasn’t “SparseArray” but instead “MyStruct” this is precisely the answer we would have desired. However, since the sparse array printer would otherwise print zeros for elements outside of the sparse backing array, this isn’t what one would expect. Making a nicer user conversion from Enzyme’s form of differential data structures, to the more natural “Julian” form where there is a semantic mismatch between what Julia intends a data structure to mean by name, and what is implemented is going on here (Supporting covariant derivatives · Issue #1334 · EnzymeAD/Enzyme.jl · GitHub).
The benefit of this representation, however, is that all of our rules compose correctly (e.g. how you got the correct answer for f(A(x))
, without the need to special case any sparse code, and with the same memory/performance expectations as the original code.
Another quick caveat: like Chris said, if Julia calls an external library like SuiteSparse that we don’t have a derivative for, Enzyme will throw a “no derivative found” error for that function. This is easily resolvable by writing a custom rule or internal Enzyme support for whatever function arises. This is admittedly limited on SuiteSparse at the moment as we’ve been focusing on other areas of expanding the scope of Enzyme.jl’s julia language support (moreso on type unstable code), but all contributions very welcome.