Surrogate with derivatives

Tamas_Papp · May 25, 2026, 1:59pm

I am solving an implicit function

f(x, y(x)) = 0

with x, y \in \mathbb{R}^n (typical n is 50–200), using iterative methods (trust region), at many x’s. This question is about finding a good initial value y_0(x) for each x, good initial values speed up the solution.

Suppose I have solved the problem for x_i, i = 1, \dots, k, obtaining the corresponding y_i, and I also have computed all the information I need for Jacobian-vector products vJ_i and J_i v with J_i = \partial y/\partial x. Importantly, typically k < n: I have fewer points than dimensions.

I can explore a lot of nice models with the excellent Surrogates.jl, but AFAICT only GEK(PLS) support derivatives, and my understanding is that I would have to go elementwise (recall: y(x): \mathbb{R}^n \to \mathbb{R}^n).

Otherwise, all the models I am familiar with use just values, not derivatives. It feels like a waste not to use them since they are already computed.

I am kind of fishing suggestions here that I could start exploring. Again, the surrogate does not have to be very good per se, it is just a starting point for an iterative method.

A naive way I thought of is obtaining first-order approximations from all points,

y^{(i)}(x) = y_i + \partial y/\partial x \cdot (x - x_i)

and averaging them out somehow. But this looks very ad-hoc, and putting discipline on this would be nice, surely someone has invented a good method for this already.

ChrisRackauckas · May 25, 2026, 2:55pm

Another choice is the Gradient-Enhanced Neural Network (GENN), GEKPLS is generally the preferred method since pure NNs are rarely a good surrogate.

It should do vector output? If not, just open an issue. IIRC there isn’t anything in the math that doesn’t apply to a vector output.

ChrisRackauckas · May 25, 2026, 2:55pm

Or a Hermite interpolation in higher dimensions.

stevengj · May 25, 2026, 7:04pm

Dimensions that high seem to rule out any kind of tensor-product method like Hermite polynomials or splines, and make interpolation difficult in general. For low-accuracy interpolation in high dimensions, neural networks such as the GENN methods Chris mentioned are one option. Another option is radial basis functions (RBFs), and there are “Hermite RBF” variants that exploit derivative information, e.g. Vaillant (2013) or Fashamiha and Salac (2025).

(Although RBFs are often written down for scalar-valued functions, the same formulations are trivially extensible to vector-valued functions simply by computing separate RBF coefficients for each component of the function, i.e. vector-valued coefficients. The cost scales linearly with the number of components, and it is embarrassingly parallel.)

There seems to be an ancient (Julia 0.5) implementation of the Vaillant HRBF algorithm SpatialFields.jl. However, RBFs are pretty straightforward to implement yourself, especially with an AD package like ForwardDiff.jl to compute the RBF derivatives for you in the HRBF case. You’ll get a 2(n+1)k \times 2(n+1)k system of equations, or rather n of them (one for each component of y), which in your case should be small enough to solve directly in a minute or less (each system being a few thousand by a few thousand, solved in \lesssim 1 s). Not sure if that is fast enough for you?

ChrisRackauckas · May 26, 2026, 8:41am

GEKPLS is Gradient-Enhanced Kriging, i.e. a Hermite RBF.

Tamas_Papp · May 26, 2026, 11:38am

Thanks for all the answers! After experimenting a bit with my problem I think that my main problem will be with extrapolation, not interpolation.

That is, often the point at which I want a surrogate will be outside (the convex hull of) the existing points.

What seems to work decently in practice (for the simplified problem I am using for testing) is

\hat{y}(x) = (1-\alpha) \cdot \bar{y} + \alpha \cdot (y_c + \frac{\partial}{\partial x}y(x_c) \cdot (x - x_c))

where \bar{x}, \bar{y} are the means of existing points, x_c is the closest point to x, and \alpha \in [0,1] is a function of x that decreases in \| x - \bar{x} \| / \| x_c - \bar{x} \|.

That is, use a linear approximation from the closest point, but to avoid giving totally crazy estimates just because of a wacky local derivative, dampen that to the mean.

stevengj · May 26, 2026, 11:42am

Good point, although RBFs (unlike Kriging) are traditionally augmented with some low-order polynomials, and often use a wider variety of basis functions than Gaussians.

Topic		Replies	Views
Surrogate creation from discrete data New to Julia	4	464	November 12, 2022
Learning Surrogates or Flux New to Julia machine-learning , simulations , modelling	7	865	September 4, 2022
Building a surrogate model of a complex physically-based model New to Julia question	4	987	May 28, 2023
Radial basis functions in surrogates.jl? Machine Learning	4	495	May 21, 2023
Parallel Optimization with Surrogates.jl Optimization (Mathematical) question	6	702	June 14, 2023

Surrogate with derivatives

Related topics