Surrogate with derivatives

I am solving an implicit function

f(x, y(x)) = 0

with x, y \in \mathbb{R}^n (typical n is 50–200), using iterative methods (trust region), at many x’s. This question is about finding a good initial value y_0(x) for each x, good initial values speed up the solution.

Suppose I have solved the problem for x_i, i = 1, \dots, k, obtaining the corresponding y_i, and I also have computed all the information I need for Jacobian-vector products vJ_i and J_i v with J_i = \partial y/\partial x. Importantly, typically k < n: I have fewer points than dimensions.

I can explore a lot of nice models with the excellent Surrogates.jl, but AFAICT only GEK(PLS) support derivatives, and my understanding is that I would have to go elementwise (recall: y(x): \mathbb{R}^n \to \mathbb{R}^n).

Otherwise, all the models I am familiar with use just values, not derivatives. It feels like a waste not to use them since they are already computed.

I am kind of fishing suggestions here that I could start exploring. Again, the surrogate does not have to be very good per se, it is just a starting point for an iterative method.

A naive way I thought of is obtaining first-order approximations from all points,

y^{(i)}(x) = y_i + \partial y/\partial x \cdot (x - x_i)

and averaging them out somehow. But this looks very ad-hoc, and putting discipline on this would be nice, surely someone has invented a good method for this already.

Another choice is the Gradient-Enhanced Neural Network (GENN), GEKPLS is generally the preferred method since pure NNs are rarely a good surrogate.

It should do vector output? If not, just open an issue. IIRC there isn’t anything in the math that doesn’t apply to a vector output.

Or a Hermite interpolation in higher dimensions.

Dimensions that high seem to rule out any kind of tensor-product method like Hermite polynomials or splines, and make interpolation difficult in general. For low-accuracy interpolation in high dimensions, neural networks such as the GENN methods Chris mentioned are one option. Another option is radial basis functions (RBFs), and there are “Hermite RBF” variants that exploit derivative information, e.g. Vaillant (2013) or Fashamiha and Salac (2025).

(Although RBFs are often written down for scalar-valued functions, the same formulations are trivially extensible to vector-valued functions simply by computing separate RBF coefficients for each component of the function, i.e. vector-valued coefficients. The cost scales linearly with the number of components, and it is embarrassingly parallel.)

There seems to be an ancient (Julia 0.5) implementation of the Vaillant HRBF algorithm SpatialFields.jl. However, RBFs are pretty straightforward to implement yourself, especially with an AD package like ForwardDiff.jl to compute the RBF derivatives for you in the HRBF case. You’ll get a 2(n+1)k \times 2(n+1)k system of equations, or rather n of them (one for each component of y), which in your case should be small enough to solve directly in a minute or less (each system being a few thousand by a few thousand, solved in \lesssim 1 s). Not sure if that is fast enough for you?