Best Practice for Logging MCMC Results?

I will be deploying a custom MCMC algorithm to an external computer as part of my thesis. I’m making heavy use of the Gen PPL, and was wondering how best to return the results of the MCMC. My idea so far was to write each state of the chain (type Gen.DynamicDSLTrace) as a string to a txt file, but I’m not sure how to convert the strings back to trace objects.

Is there a better way to log each trace so that I can recover the chain even if the program stops early?

Solved: Serialization · The Julia Language

Given an array “traces” of type Gen.DynamicDSLTrace:

# Save the traces
serialize("filename.jld", traces)
# Re-load the traces
recovered_traces_array = deserialize("filename.jld")
1 Like

Be aware though that the serialization format/details might change between Julia versions. It’s not a long term storage solution.

1 Like

Thanks for pointing this out. Do you have any recommendations for learning more about good I/O practices for custom objects in Julia?

The thing is that “custom objects” is an infinitely huge space which is hard to support. My general take is that there is an inevitable tradeoff between flexibility in what you can store and long term suitability. If you are considering short time scales than JLD2.jl or perhaps BSON.jl are great because they support storing arbitrary Julia objects in a binary format (expect for parameters and such you generally want to avoid text formats). However, if you consider larger time scales it is likely that these packages break in some way (happened to JLD/JLD2) and their format isn’t/stops to be backward compatible. Also you are pretty much bound to Julia, that is you can’t read you data with other programming languages. For these reasons, I would advice you to go with a well-established data format, my favourite being HDF5.jl. While it has its own minor disadvantages, it is a established format that you can read with pretty much every programming language and which can hold all essential data objects (such as matrices etc.).
In my experience (running large scale MCMC simulations of the order of >10 million CPU hours) it is a great choice for storing MCMC results (including simulation parameters etc.).

2 Likes