Hello everyone :)…
Based on my understanding of Turing based inference(case of NUTS). We see that during warmup time chain gets to convergence and then we collect samples. Now there are two things .
-
when model is simple and inference is running fast there is no issue with current setup.
-
But when model is complex and inference is slow (example – doing lets say 100 warm up and 100 samples in 2 days) problem starts to emerge because you kind of working with a black box. In NUTS you must experiment with warm up to finally see convergence. Now with big models it’s difficult.
A solution I think can be if Turing can give info while running. Let’s say after finishing warmup (before collecting sample) it says convergence have been achieved or still not converged because then person can just stop and run with bigger warmup.
Is a way already exits?
Do you mean whether a convergence diagnostics exists? Rhat in the chain summary is just that if that’s what you’re asking. We normally aim for an Rhat as close to 1 as possible. Lower than 1.1 or 1.01 is recommended.
yes but this comes in the end of run and we are kind of blind during run process. now as I said its all good for simple models whose inference are fast but for complex models whose inference can take hours, its not very productive as you wait for a long period to judge your run.
It will be nice that we can get some info after some particles have been samples lets say after 100 or 1000 depending are how one defines.
Yeah, unfortunately, there is currently no way to do this. But you can always subsample the data a little bit and see how the chain does. In practice, having less data makes the posterior more challenging for the sampler. So it should give you a conservative idea about how long one should run the chain.
Issue is we need to give large warm up even if we run for let’s say just 100 samples.
Example: If for a given inference problem, we need atleast 500 warmup to get good point, from where sampling can start then we have no choice but we need to give this 500 warmup. Now when model becomes complex it requires more of these warmup. i feel that it will be very nice that we can get some information even during this phase that how things are going on rather then sitting blind.
Okay gere is the thing. The truth is, warmup is not just about convergence. It also performs adaptation of the MCMC algorithm. This is about asymptotic variance, not convergence. That’s why we call this stage warmup not just burn-in. To properly judge adaptation, you actually need to run the chain longer after warmup. This is because adaptation is judged by the ESS, which assumes stationarity. Therefore, to really tell whether warmup is sufficient or not, you have to run the chain long enough anyways! Therefore, monitoring just the Rhat online does not provide the full picture