Optimizing noisy objective

Tamas_Papp · June 29, 2019, 11:30am

An update: I have programmed a version of SPSA and indeed it is working fine (after some tuning). Thanks, @dlakelan! Of course I will make it public soon, after including some more tests.

I am wondering about the following: suppose that for a given parameter \theta, I am simulating N individuals to obtain some average statistic

\bar{h}(\theta) = \frac1N \sum_{i = 1}^N g(\theta)

and my objective is some \bar{h}(\theta) - h_0.

Intuitively, it would make sense to use a low N for he initial phase for finding the right region, and a large N for the later phase, because in the beginning I care more about convergence than accuracy, then I ran refine later. Eg 10k vs 100k seems to work well. Is there any systematic, theoretically supported way of doing this, eg an N(k) function scheme?

dlakelan · June 29, 2019, 4:21pm

glad you got it working, it’s a very clever algorithm. I don’t know about the idea of N(k) it seems this would be very problem specific. For example suppose your simulation is just this side of having infinite variance… N will need to be very large. On the other hand if your simulation is well behaved it could be much smaller.

aharoun · June 29, 2019, 5:41pm

Central Limit Theorem should give some idea for what N should be in order to bound the error in \bar{h}(\theta) at a certain level, especially if the variance of g(\theta) does not change a lot with \theta.

robsmith11 · July 4, 2019, 7:24pm

Are you getting better results with your SPSA than nlopt’s CRS2?

I’ve been working on some higher dimensional problems where CRS2 currently has the best convergence, so I’m curious to try your SPSA when it’s ready.

hendri54 · July 8, 2019, 4:31pm

It appears that https://github.com/robertfeldt/BlackBoxOptim.jl has an SPSA implementation.

jmcastro2109 · May 12, 2022, 2:36pm

Are there any new developments on this type of problems?

I am currently using @Tamas_Papp MultistartOptimization.jl which works like a charm, but I am always interested intrying new things.

Tamas_Papp · June 10, 2022, 2:56pm

Personally, 3 years after asking the original question my lesson in optimizing noisy objectives is to avoid it at all cost.

It is so inefficient with larger dimensions (compared to alternative methods, especially anything that uses at least first derivatives) that even spending a month or two at reformulating a problem with all the tricks I can think of is usually worth it. Coupled with multiple local modes, it is usually a nightmare for anything above 10–20 dimensions (depending on how nasty the multimodality is).

dleather · October 7, 2022, 3:45pm

I have a similar problem in economics, where some model implied moments need to be simulated to compute the likelihood.

Did you end up using AD on the simulator?

As a side-note, Ken Judd recommended the POUNDERS algorithm for derivative-free optimization of noisy and non-smooth objective functions if the objective is a non-linear least-squares problem .I haven’t seen a Julia implementation though. I’m not sure how different this is from DFO-LS.jl, mentioned above.

Tamas_Papp · October 11, 2022, 9:07am

Yes. I now try to code all simulations so that they are AD’able from the very beginning, and test for this in CI so as not to break it inadvertently even if I don’t use it at the moment.

How to do this is very model-specific. Usually it requires rethinking the moments you are targeting, from the very beginning. My standard bag of tricks include mapping an [0,1] number to durations for models with (competing) Poisson processes [pretty much all labor market models], and for multiple discrete outcomes using the cumulative probability for each (which is differentiable).

I could discuss these in a series of blog posts if there is interest.

ChrisRackauckas · October 11, 2022, 9:17am

We have a new NeurIPS paper and AD coming out at the end of the week which handles this case

Tamas_Papp · October 11, 2022, 9:34am

All of these tricks are known and have been around for a long time, I claim no originality. A standard reference is eg

@book{rubinstein1993discrete,
  title={Discrete event systems: sensitivity analysis and stochastic optimization by the score function method},
  author={Rubinstein, Reuven Y and Shapiro, Alexander},
  volume={13},
  year={1993},
  publisher={Wiley}
}

they also have a nice book about MC.

ChrisRackauckas · October 11, 2022, 9:41am

The score function method is biased and has high variance. This is fixed and made automatic.

Tamas_Papp · October 11, 2022, 9:42am

Interesting. Can you please add a reference to the paper you mention?

ChrisRackauckas · October 11, 2022, 9:43am

Not until it’s released later this week. I set a reminder to respond here. I think it goes up Wednesday?

dleather · October 11, 2022, 3:25pm

I would be interested. And I think future economists who decide to adopt Julia would find your insights helpful!

ChrisRackauckas · October 18, 2022, 11:28am

The paper is out now, with the Twitter thread summary.

The package is:

daniel · October 18, 2022, 12:15pm

Just skimmed through it, this should enable cool stuff like Hamiltonian Monte Carlo with discrete parameters without having to rely on marginalization, right?

ChrisRackauckas · October 18, 2022, 12:46pm

yeah, there’s a lot of fun follow-ups like that. Tons and tons and tons of student projects haha.

dlakelan · October 18, 2022, 2:08pm

Cool abstract I will try to read the paper later today.

I will say that at one point I played with the SPSA method for unbiased numerical differentiation and it was able to drive an HMC type calculation but only stayed stable with extremely small timesteps in my test case. But it’s evidence that this kind of thing is workable!

dleather · October 20, 2022, 6:29pm

Very cool! As someone who works with Markov-switching models I am very interested in trying this out.

Can the package handle mutations?

Topic		Replies	Views
Latest recommendations for global optimization Optimization (Mathematical) package	92	13461	August 23, 2022
Optim: What optimiser is best if your gradient computation is slow? Optimization (Mathematical) optim	13	3198	August 22, 2017
Optim and differential equations? Specific Domains	45	3528	July 5, 2019
Which optimization package should I use? Optimization (Mathematical) optimization , nonlinear	13	3227	May 1, 2021
Probabilistic programming with source transformations Statistics announcement	66	7609	July 2, 2018

Optimizing noisy objective

Related topics