Hi,

I will first describe a use-case I have, the issues I have with existing tools, propose a solution to the problem, and finally ask my questions - to people with a deep knowledge of Julia - whether my proposed solution might fly. Here goes.

**Use-case**

I am currently working on a finite element (FE) framework, specially for optimisation problems: I want to optimise some cost (or maximize a probability), under the constraint that the FE model be exactly verified. “Framework” here means that I need to think very carefully so that I make the work of adding elements to this system as simple as possible. After much consideration, I arrived at a simple solution: the element programmer must provide a function `lagrangian`

that takes in O(10)-O(100) arguments, and returns a scalar (an additive contribution to the Lagrangian of the constrained optimisation problem).

Hence my requirement: I need to compute the Hessian of a scalar-valued function of (typicaly) 10 to 100 inputs (degrees of freedom) , and this needs to be very fast.

**Issues with existing tools**

Both ForwardDiff and Zygote (reverse diff++) provide the functionality needed. However, forward automatic differentiation, nested to the second order (Hessian), operating on a scalar-valued function, will have low performance: it lugs around a large number of zeros. I have been testing Zygote, and there it seems that the overheads in the computations become large related to the relative small numbers of inputs (Zygote and StaticArrays). I suspect Zygote is designed for considerably larger numbers of input variables.

**Proposed solution**

I think in terms of number of flops, we could achieve very decent performance using forward differentiation, nested to the second order, with *sparse partials*. Forward differentiation, to be performant, must not allocate: the partials must be stored on the stack. In Julia, only [immutable, not a problem] variables of known size can be stored on the stack. As a consequence, a sparse partial would need to have its *sparsity structure as a part of the type*. As a further consequence, “each” number within `lagrangian`

would have a different type, so that arrays of such numbers would be heterogenous - and allocating on he stack is still forbidden.

**Questions**

I have long ago programmed my own nested forward differentiation, which does essentially the same as ForwardDiff.jl, so I am pretty sure I would know how to create sparse partials and fit then into forward diff.

`partial`

s need to be able to be multiply by scalars (I am deliberately not saying of what type!), and added with each other. My intention is that *the compiler must create a specialised method instance for each combination of sparsity structure in these operations*. Clearly this is going to result in large compilation times. Let’s say that I accept that.

- Will the compiler indeed create specialised method instances, or will it fall back on creating a smaller number of type-unstable instances (inacceptable)?
- Will the specialised method instances be efficient - or will the sparse operations sabotage things I know little about like pipelining/vectorisation/BLAS?
- Will the method instances be statically linked into the code or will there be some longish table lookup at run time to decide which instance to call?

But that’s just the beginning:

Now each differentiated number has its own data type (because its first and second order partials have their own sparsity pattern). For FE, my programming paradigm has been to use linear algebra operations on `StaticArrays`

(neat code, keeps you on he stack). Obviously, `StaticArrays`

does not handle arrays with elements of different types, so a new type of arrays must be created, with questions 1), 2), 3) just as relevant. At a minimum, I want to be able to do additions, and tensor operations on such arrays.

- Does this imply a huge job (rewriting the whole of
`StaticArrays`

, or is there some minimum interface to be implemented, which would give access to already developed functionality - and result in efficient compiled code?.

If this does not scare you, you do not understand what I am proposing to do .