I frequently run MCMC for Bayesian estimation which
-
takes a long time (weeks),
-
had a large dimension (10^5–10^6)
It would be great to have a means to
-
save the results to disk while in progress,
-
“peek into” the progress of the chain while it is running (eg calculate ESS & \hat{R})
I am wondering if there is an existing solution, and if not, opening this discussion to brainstorm about it, so that various MCMC implementations could standardize on a common format.
The ideal solution would be
- economical with disk space (eg allow the user
Float32when that is enough for storage, or automatically thin samples), - failure tolerant (eg computation shut down in the middle for whatever reason would still have partial results),
- have a “core” API just for retrieving the posterior results (analogous to an
AbstractMatrix), - but also means to save occasional simple metadata (adaptation info, etc)
- not be tied to a specific Julia version or data structure (so serialization and JLD2 are not ideal)
Suggestions welcome.