On-line storage of MCMC output

I frequently run MCMC for Bayesian estimation which

  1. takes a long time (weeks),

  2. had a large dimension (10^510^6)

It would be great to have a means to

  1. save the results to disk while in progress,

  2. “peek into” the progress of the chain while it is running (eg calculate ESS & \hat{R})

I am wondering if there is an existing solution, and if not, opening this discussion to brainstorm about it, so that various MCMC implementations could standardize on a common format.

The ideal solution would be

  1. economical with disk space (eg allow the user Float32 when that is enough for storage, or automatically thin samples),
  2. failure tolerant (eg computation shut down in the middle for whatever reason would still have partial results),
  3. have a “core” API just for retrieving the posterior results (analogous to an AbstractMatrix),
  4. but also means to save occasional simple metadata (adaptation info, etc)
  5. not be tied to a specific Julia version or data structure (so serialization and JLD2 are not ideal)

Suggestions welcome.

A netcdf (using NCDatasets.jl for instance) could do the job? It’s mainly used for climate data but it ticks all the boxes and can easily be used outside of julia.