On-line storage of MCMC output

Tamas_Papp · March 13, 2026, 9:31am

I frequently run MCMC for Bayesian estimation which

takes a long time (weeks),
had a large dimension (10^5–10^6)

It would be great to have a means to

save the results to disk while in progress,
“peek into” the progress of the chain while it is running (eg calculate ESS & \hat{R})

I am wondering if there is an existing solution, and if not, opening this discussion to brainstorm about it, so that various MCMC implementations could standardize on a common format.

The ideal solution would be

economical with disk space (eg allow the user Float32 when that is enough for storage, or automatically thin samples),
failure tolerant (eg computation shut down in the middle for whatever reason would still have partial results),
have a “core” API just for retrieving the posterior results (analogous to an AbstractMatrix),
but also means to save occasional simple metadata (adaptation info, etc)
not be tied to a specific Julia version or data structure (so serialization and JLD2 are not ideal)

Suggestions welcome.

sablonl · March 13, 2026, 10:55am

A netcdf (using NCDatasets.jl for instance) could do the job? It’s mainly used for climate data but it ticks all the boxes and can easily be used outside of julia.

Tamas_Papp · March 13, 2026, 11:47am

Thanks. My understanding is that NetCDF is practically HDF5 though, so I guess I could just use that.

spragud2 · March 13, 2026, 12:07pm

Hey man, so funny you say this. I have had the same thought

I’ve been working on a personal project for fun, a combination of wanting to move outside the bounds of the existing ecosystem in Julia and wanting to build a Bayesian “IDE”. It’s a TUI – so not quite what you want, but I think this is a clear demonstration that callbacks are fine with NUTS sampling.

I shared your frustration immensely that there was no out of the box solution for looking at chains in real time.

Here’s a clip of my Tachikoma live sampling viewer. Give it a moment for when Enzyme is compiling the gradient!

https://asciinema.org/a/ZSBs3oqsjZntOHUx

edit: The traces/hists/stats don’t start showing until after warmup is completed

Tamas_Papp · March 16, 2026, 8:21am

I have my own ad-hoc solution (dump to a CSV line by line, then have a script that loads each and calculates what I want), just thought I would invest in something less hacky.

A GUI/TUI for monitoring is one application, but I think it would make sense to figure out the storage format separately and build on that.

I am now drafting an API and will post it here.

jmair · March 16, 2026, 1:45pm

Some of the tooling for machine learning is pretty good for this. TensorBoardLogger.jl works if you integrate it into your solver to run every X updates. It just saves to disk and you can launch a web interface with the tensorboard python package to view the results in a browser. Also works if you run on a remote machine like a HPC if you port forward (very easy with VS code SSH extension).

At our work, I’ve set up an MLFlow server that’s publicly available and any machine with the right credentials can save “experiments” (similar to tensorboard with scalar, picture or matrix logging). This is a much more python centric solution but there are some client libraries in Julia like MLFlowClient.jl but really useful to monitor experiments that take forever and to share results with team members.

Note that the above tools that are best for monitoring. For long running jobs I tend to use something like HDF5 and save snapshots that can be restarted and run to completion. You can add extra storage to MLFlow to add files to a particular “experiment”, but it’s not the most performant solution.

sethaxen · March 17, 2026, 12:39pm

I believe cmdstan writes draws online to a CSV file in their own format, which has become a bit more standard across PPLs, e.g.

MCMCChains now can read/write Stan CSV files: Add feature to save MCMCChains objects as JSON by shravanngoswamii · Pull Request #502 · TuringLang/MCMCChains.jl · GitHub
nuts.rs, the Rust implementation of NUTS in nutpie, which is the sampling backend recommended by PyMC devs, can write online to Stan CSV files: https://github.com/pymc-devs/nuts-rs/pull/39 . I don’t think this is exposed via the Python interface though; I think they stream instead to zarr or arrow.

spragud2 · March 17, 2026, 3:06pm

I get that CSV files are way more accessible, but I find the reliance on them in this case a bit perplexing. I imagine most users needing to read their chains in real time are probably users for whom reading an HDF5 file is no big deal.

HDF5 offers way more flexibility, efficiency, compression, etc.

Ralph_Smith · March 21, 2026, 2:27am

IIUC, HDF5 by itself does not satisfy the goal property

HDF “files” must be explicitly closed to flush data (and metadata) to the filesystem, and glitches before closing may leave them much less useful than a truncated CSV file. So you might want to augment an HDF5 sink with some checkpointing and output-swapping for a long run.

Topic		Replies	Views
Best Practice for Logging MCMC Results? Probabilistic Programming	4	601	February 12, 2021
MCMC algorithm standard output datatype (MCMCChains or others?) General Usage	6	557	June 25, 2021
How to save to disk my results so far? New to Julia question , turing	9	2078	May 8, 2020
How do you save data in Monte Carlo simulations? Data question , data	8	2378	August 16, 2017
Analysis and Diagnostics for MCMC Statistics	27	3797	October 24, 2022

On-line storage of MCMC output

Related topics