Optimizing over an integral that is computed with simulation

If g(x)\equiv \int h(\tau) dF(\tau) is a multi-dimensional integral which I’d need Monte Carlo integration to compute, is there a good way to optimize the function g(\cdot)? Would standard optimization routines (from e.g. JuMP) work on this? Would autodifferentiation be possible/necessary when optimizing g?

This is a stripped-down version of my last question which no one answered yet. Sorry for a ~double post, but maybe a more minimal question will fare better.

What’s the dimensionality? In low dimensions, there are many potential alternatives to Monte-Carlo methods.

If you really need to use Monte-Carlo integration, then you probably can’t converge it to high enough accuracy that the noise is negligible. In this case, you have a stochastic optimization problem, and you should probably use an algorithm specialized for that sort of problem such a Adam (which has several implementations in Julia, e.g. it is in Nonconvex.jl and Flux.jl).

Would autodifferentiation be possible/necessary when optimizing g?

Automatic differentiation is never “necessary”. You can always compute gradients manually. If there are a small number of optimization parameters, you can also use a gradient-free approach, but it is definitely better to compute gradients analytically (whether automatically or by hand) if possible.

And yes, automatic differentiation is probably applicable, depending on how your integrand is written. (If x only appears in the integrand, as opposed to the integral limits, then the derivative of g is just the integral of the derivative of the integrand.)

2 Likes

If you are already using JuMP, https://pulsipher.github.io/InfiniteOpt.jl/dev/ is a great extension of JuMP that can support formulations involving “expectations” wrt random variables using a range of algorithms for computing expectations including Monte Carlo (MC) and quadrature methods, e.g. see https://pulsipher.github.io/InfiniteOpt.jl/dev/guide/measure/. MC methods are only appropriate for low variance functions though because the variance in the expectation estimate (the sample mean) is

Var(\tilde{E}_{\xi} (f(x; \xi))) = Var(\sum_{i=1}^N f(x; \xi_i) / N) = Var(f(x; \xi)) / N

where N is the number of MC samples used to compute the expectation estimate. If the variance of the integrand f is small enough, MC can be used even in high dimensional cases because you can get the noise down to negligible values for deterministic optimization algorithms to work without noticing the problem is changing each time. You can also just fix the set of samples upfront and get a deterministic approximation of the original problem regardless of the variance of f, but in this case the true quality of the solution you get will be worse with fewer samples or higher variance of f.

1 Like