Surrogate with derivatives

I am solving an implicit function

f(x, y(x)) = 0

with x, y \in \mathbb{R}^n (typical n is 50–200), using iterative methods (trust region), at many x’s. This question is about finding a good initial value y_0(x) for each x, good initial values speed up the solution.

Suppose I have solved the problem for x_i, i = 1, \dots, k, obtaining the corresponding y_i, and I also have computed all the information I need for Jacobian-vector products vJ_i and J_i v with J_i = \partial y/\partial x. Importantly, typically k < n: I have fewer points than dimensions.

I can explore a lot of nice models with the excellent Surrogates.jl, but AFAICT only GEK(PLS) support derivatives, and my understanding is that I would have to go elementwise (recall: y(x): \mathbb{R}^n \to \mathbb{R}^n).

Otherwise, all the models I am familiar with use just values, not derivatives. It feels like a waste not to use them since they are already computed.

I am kind of fishing suggestions here that I could start exploring. Again, the surrogate does not have to be very good per se, it is just a starting point for an iterative method.

A naive way I thought of is obtaining first-order approximations from all points,

y^{(i)}(x) = y_i + \partial y/\partial x \cdot (x - x_i)

and averaging them out somehow. But this looks very ad-hoc, and putting discipline on this would be nice, surely someone has invented a good method for this already.

Another choice is the Gradient-Enhanced Neural Network (GENN), GEKPLS is generally the preferred method since pure NNs are rarely a good surrogate.

It should do vector output? If not, just open an issue. IIRC there isn’t anything in the math that doesn’t apply to a vector output.

Or a Hermite interpolation in higher dimensions.

Dimensions that high seem to rule out any kind of tensor-product method like Hermite polynomials or splines, and make interpolation difficult in general. For low-accuracy interpolation in high dimensions, neural networks such as the GENN methods Chris mentioned are one option. Another option is radial basis functions (RBFs), and there are “Hermite RBF” variants that exploit derivative information, e.g. Vaillant (2013) or Fashamiha and Salac (2025).

(Although RBFs are often written down for scalar-valued functions, the same formulations are trivially extensible to vector-valued functions simply by computing separate RBF coefficients for each component of the function, i.e. vector-valued coefficients. The cost scales linearly with the number of components, and it is embarrassingly parallel.)

There seems to be an ancient (Julia 0.5) implementation of the Vaillant HRBF algorithm SpatialFields.jl. However, RBFs are pretty straightforward to implement yourself, especially with an AD package like ForwardDiff.jl to compute the RBF derivatives for you in the HRBF case. You’ll get a 2(n+1)k \times 2(n+1)k system of equations, or rather n of them (one for each component of y), which in your case should be small enough to solve directly in a minute or less (each system being a few thousand by a few thousand, solved in \lesssim 1 s). Not sure if that is fast enough for you?

GEKPLS is Gradient-Enhanced Kriging, i.e. a Hermite RBF.

Thanks for all the answers! After experimenting a bit with my problem I think that my main problem will be with extrapolation, not interpolation.

That is, often the point at which I want a surrogate will be outside (the convex hull of) the existing points.

What seems to work decently in practice (for the simplified problem I am using for testing) is

\hat{y}(x) = (1-\alpha) \cdot \bar{y} + \alpha \cdot (y_c + \frac{\partial}{\partial x}y(x_c) \cdot (x - x_c))

where \bar{x}, \bar{y} are the means of existing points, x_c is the closest point to x, and \alpha \in [0,1] is a function of x that decreases in \| x - \bar{x} \| / \| x_c - \bar{x} \|.

That is, use a linear approximation from the closest point, but to avoid giving totally crazy estimates just because of a wacky local derivative, dampen that to the mean.

Good point, although RBFs (unlike Kriging) are traditionally augmented with some low-order polynomials, and often use a wider variety of basis functions than Gaussians.