How to save to disk my results so far?

Manoj_Gopalkrishnan · May 7, 2020, 9:47am

I am using Turing from IJulia in Jupyter notebook. I am running some MCMC simulations which are taking some time to converge. How do I save the chains to disk so that I can reload them next time I start the kernel instead of having to run the computation again? I’m not sure if this is a question specific to Turing and Julia, or a more general Jupyter Notebook question.

trappmartin · May 7, 2020, 10:00am

cc: @cpfiffer

cpfiffer · May 7, 2020, 1:42pm

You should be able to load and save a chain with this:

# Save a chain.
write("chain-file.jls", chn)

# Read a chain.
chn2 = read("chain-file.jls", Chains)

Let me know if this doesn’t work for you. I have long wanted to improve the saving functionality, so if this doesn’t work I might get around to moving us over to JLD storage for good.

Manoj_Gopalkrishnan · May 7, 2020, 2:37pm

Yes this worked fine! I am able to write to disk, recover and describe(chn2) also returns all the stats.

I have a follow-up question. I tried

resume(chn2,40)

and got the error message

AssertionError: [Turing] cannot resume from a chain without state info

I tried chn2.info and I get

NamedTuple()

So where is this going wrong?

Manoj_Gopalkrishnan · May 7, 2020, 2:40pm

To be clear, even the original chain refuses to resume, so this is not an issue with the write and read. Probably just me not able to figure out how to resume.

cpfiffer · May 7, 2020, 3:19pm

Ah, I see. We made a change a little while ago that requires you to run sampling with the keyword save_state=true, i.e.

chain = sample(model, sampler, n_samples; save_state=true)

After which, resume should work. But one issue with this is that write/read don’t work very well with the save state, and so you might also get an error here.

I think the biggest issue is that we serialize the model to disk, which is the biggest pain point. The signature for resume should probably be resume(model, chain, n_samples) since users already have their model definitions and saving it Turing-side is difficult.

If using save_state=true doesn’t work for you, could you open an issue at MCMCChains.jl to remind me?

trappmartin · May 7, 2020, 3:26pm

Feels like we should really add some tests into Turing for this.

cpfiffer · May 7, 2020, 3:34pm

I mean, the problem is not even testing (though it would help). A lot of the chain I/O code is very old (basically still the original Mamba code), and has not been seriously re-evaluated since. I have long wanted to modernize it but haven’t yet gotten to it.

Manoj_Gopalkrishnan · May 8, 2020, 1:00pm

So I managed to add the save_state=true flag, and this means I can now run my chain for a short number of steps, plot some graphs, and then resume if needed. This is great!

Writing chain to disk and read also succeeded, so I can turn my kernel off and on again, without having to run all this sampling once more! So it’s working, at least for now…

By the way, I have been getting a very weird error. It only happens when I type HMC into a Jupyter cell. When I start typing parameters HMC(.05,10) into HMC, when I reach the point where I type

HMC(0.)

the cell freezes. Any key I press, delete, doesn’t respond, or it copies the previous line of the notebook many times, with the same line number many times. This is really weird, and I have been able to reproduce the same thing many times.

I know this is not a minimal example, but I am working to a deadline, and got around this by not using HMC for now. NUTS works great, so I’m just using that for now. But thought I should let folks known in case this is some known bug.

trappmartin · May 8, 2020, 1:43pm

That sounds very strange. Not sure this is Turing related though. You could try a different IDE and see if the problem still occurs.

To list a few:

Topic		Replies	Views
Save a fitted Turing model to disk Probabilistic Programming question	4	257	July 4, 2024
Resume chains Turing Probabilistic Programming turing	12	1323	June 9, 2023
Function for resuming mcmc chain Probabilistic Programming question , turing , mcmc	13	853	September 12, 2024
How to solve the error while loading saved chain as JLD2 file or JLS file? New to Julia jld2 , turing	6	1418	August 31, 2023
Turing checkpointing by callbacks Probabilistic Programming turing	0	160	May 23, 2023

How to save to disk my results so far?

Related topics