Smooth delta functions to compute Jacobian and Hessian from samples

mleprovost · August 20, 2020, 6:24pm

Hello,

Given an expensive function \boldsymbol{G} :\mathbb{R}^n \xrightarrow{} \mathbb{R}^m whose value is known at samples \boldsymbol{x}^1,\ldots, \boldsymbol{x}^N, I would like to compute the Jacobian and Hessian at each sample or at least at the mean of the samples \overline{\boldsymbol{x}}, without new evaluations of \boldsymbol{G}.

The package https://github.com/Mattriks/CoupledFields.jl can be a solution, @Mattriks can you give me more background on the techniques used to construct these gradients?

I would like to get your feedback on the following technique that relies on smooth delta functions:

Let’s start with the continuous form:

\boldsymbol{G}(\overline{\boldsymbol{x}}) = \int_{\boldsymbol{x} \in \mathbb{R}^n} \boldsymbol{G(x)} \delta(\overline{\boldsymbol{x}} - \boldsymbol{x}) \mathrm{d}\boldsymbol{x} \mbox{ with } \delta \mbox{ the Dirac delta function.}

Thus we can get the Jacobian at \overline{\boldsymbol{x}} as:

\partial_{\overline{\boldsymbol{x}}_i} G(\overline{\boldsymbol{x}}) = \int_{\boldsymbol{x} \in \mathbb{R}^n} \boldsymbol{G}(\boldsymbol{x}) \delta_i^{(1)}(\overline{\boldsymbol{x}} - \boldsymbol{x}) \mathrm{d}\boldsymbol{x} \mbox{ with }\delta^{(1)}_i(\boldsymbol{z}) = \frac{\partial \delta(\boldsymbol{z})}{\partial z_i}.

We can replace the Dirac delta function by a smoothed version \delta_h with a compact support.

The function \delta(\boldsymbol{z}) can be factorized as a product of one-dimensional delta functions \delta(\boldsymbol{z}) = \prod_{i=1}^{n} \delta(z_i), (also holds for \delta_{h}). Therefore we can easily construct the derivative kernels \delta^{(1)}_i(\boldsymbol{z}) = \partial \delta(\boldsymbol{z})/{\partial z_i} and \delta^{(2)}_{ij}(\boldsymbol{z}) = \partial^2 \delta(\boldsymbol{z})/\partial z_i\partial z_j.

For our discrete samples, we get:

\partial_{\overline{\boldsymbol{x}}_i} G(\overline{\boldsymbol{x}}) \approx \sum_{k=1}^N \boldsymbol{G}(\boldsymbol{x}^k) \delta_i^{(1)}(\overline{\boldsymbol{x}} - \boldsymbol{x}^k)

where h is set according to the spread of the samples about the \overline{\boldsymbol{x}}

stevengj · August 20, 2020, 6:48pm

The usual thing to do would be to construct an interpolant, e.g. with Chebyshev polynomials. Once you have a smooth interpolant from your sample points, you can compute any derivatives you want.

(There are several packages that can do Chebyshev interpolation for you in Julia, e.g. ApproxFun and ChebyshevApprox.)

Two key questions: what is the dimension n of your input, and can you choose the sample points? Interpolation is easiest if n is moderate (\lesssim 5) and you can choose the sample points (e.g. Chebyshev nodes).

Another possible interpolant, which is vaguely similar to your “smooth delta functions” suggestion, is to use radial basis functions. (Approximate delta functions by themselves won’t work well…that direction leads towards finite-difference approximation, which is what an approximate \partial\delta/\partial x essentially is.)

mleprovost · August 20, 2020, 7:07pm

Thank you for your answer,

The input size n can vary from 50-100, the output dimension m is 50, and we have 50-200 samples. Unfortunately, I can’t choose the sample points. They are samples from a prior distribution, a Gaussian approximation of the distribution is reasonable.

stevengj · August 20, 2020, 7:11pm

In that case, I would think about adjoint methods (automated ala Zygote or by hand) to compute derivatives of G directly. Accurate interpolation in 50 dimensions is hard unless the function is very nicely behaved (effectively low-dimensional), and taking derivatives tends to amplify interpolation errors…

mleprovost · August 20, 2020, 8:11pm

I forgot to mention that the samples have a relative small spread about the mean, so we are only interested in a very local behavior of the function.

Thank you for your notes. I am not familiar with adjoint methods. How can I reformulate my problem \boldsymbol{y} = \boldsymbol{G}(\boldsymbol{x}) to fit in the nonlinear equations section of your paper, i.e. \boldsymbol{f}(\boldsymbol{x}, \boldsymbol{p}) = 0

stevengj · August 20, 2020, 8:47pm

It’s hard to say more without knowing anything about the function G that you want to differentiate.

stevengj · August 20, 2020, 8:49pm

If the function G isn’t changing much among your samples, so that it is well approximated by a low-order polynomial, then I would just try a polynomial or radial-basis-function fit.

mleprovost · August 21, 2020, 12:21am

If I could afford more evaluations of the function, the best will probably be to use a Smolyak grid to construct an interpolant and then use automatic differentation?

stevengj · August 21, 2020, 3:04am

Smolyak/sparse grids aren’t a panacea. They don’t work well if the Hessian has lots of non-negligible off-diagonal elements.

In high dimensions, there is no free lunch for interpolation — there has to be something special about your function. That’s why I said that it may be easier just to figure out how to differentiate G directly.

Mattriks · August 21, 2020, 4:51am

Going back to the OP, CoupledFields is based on kernel regression.
If yᵢ = g(xᵢ), then it follows from kernel ridge regression that:

∇g(xᵢ) = ∇K(xᵢ, ⋅) (Gₓ + 10ᵇnI)⁻¹ Y

where Gₓ is the gram matrix of X, and ∇K() is the gradient function of a kernel function.

In CoupledFields, for gradvecfield([a b], ...), a is a smoothness parameter (that scales an auto-estimate of kernel width), and b is a ridge parameter (as in the equation above).
Using a=1.0 should typically work well.

Here’s an example. The black line is the true value, the gradient function is estimated using gradvecfield([a -7], X, Y, GaussianKP(X)) for 3 values of a.

CF_sinc

For high-dimensional problems, the average product of jacobians can be used to find an active subspace.

ctkelley · August 21, 2020, 10:16am

I use sparse grids (aka Smolyak) for reduced order models in computational chemistry. As @stevengj says, they don’t solve all your problems. Our function evaluations are very expensive and we can only populate grids for moderate orders and lowish (<=10) dimensions before the computations start taking weeks.

If you don’t care about boundary conditions, you may have better luck. We do and have to mix polynomial and trig functions bases. This is not easy. I have a student who is about to get a PhD for getting this into the ORNL Tasmanian code.

You could also get some Gibbs effects for your delta function approximation.

Topic		Replies	Views
ApproxFun for vector-valued function Numerics	11	799	August 19, 2020
Numerical Jacobian New to Julia	6	3478	October 30, 2019
Simultaneous Jacobian and Hessian computations Numerics question , zygote , finitediff	7	787	August 13, 2020
High-order derivatives of sampled data General Usage finitediff , splines	9	2306	September 1, 2021
Numerical differentiation and Integration General Usage	4	3049	October 3, 2018

Smooth delta functions to compute Jacobian and Hessian from samples

Related topics