I am using Turing from IJulia in Jupyter notebook. I am running some MCMC simulations which are taking some time to converge. How do I save the chains to disk so that I can reload them next time I start the kernel instead of having to run the computation again? I’m not sure if this is a question specific to Turing and Julia, or a more general Jupyter Notebook question.
cc: @cpfiffer
You should be able to load and save a chain with this:
# Save a chain.
write("chain-file.jls", chn)
# Read a chain.
chn2 = read("chain-file.jls", Chains)
Let me know if this doesn’t work for you. I have long wanted to improve the saving functionality, so if this doesn’t work I might get around to moving us over to JLD storage for good.
Yes this worked fine! I am able to write to disk, recover and describe(chn2) also returns all the stats.
I have a follow-up question. I tried
resume(chn2,40)
and got the error message
AssertionError: [Turing] cannot resume from a chain without state info
I tried chn2.info and I get
NamedTuple()
So where is this going wrong?
To be clear, even the original chain refuses to resume, so this is not an issue with the write and read. Probably just me not able to figure out how to resume.
Ah, I see. We made a change a little while ago that requires you to run sampling with the keyword save_state=true
, i.e.
chain = sample(model, sampler, n_samples; save_state=true)
After which, resume
should work. But one issue with this is that write
/read
don’t work very well with the save state, and so you might also get an error here.
I think the biggest issue is that we serialize the model to disk, which is the biggest pain point. The signature for resume
should probably be resume(model, chain, n_samples)
since users already have their model definitions and saving it Turing-side is difficult.
If using save_state=true
doesn’t work for you, could you open an issue at MCMCChains.jl to remind me?
Feels like we should really add some tests into Turing for this.
I mean, the problem is not even testing (though it would help). A lot of the chain I/O code is very old (basically still the original Mamba code), and has not been seriously re-evaluated since. I have long wanted to modernize it but haven’t yet gotten to it.
So I managed to add the save_state=true flag, and this means I can now run my chain for a short number of steps, plot some graphs, and then resume if needed. This is great!
Writing chain to disk and read also succeeded, so I can turn my kernel off and on again, without having to run all this sampling once more! So it’s working, at least for now…
By the way, I have been getting a very weird error. It only happens when I type HMC into a Jupyter cell. When I start typing parameters HMC(.05,10) into HMC, when I reach the point where I type
HMC(0.)
the cell freezes. Any key I press, delete, doesn’t respond, or it copies the previous line of the notebook many times, with the same line number many times. This is really weird, and I have been able to reproduce the same thing many times.
I know this is not a minimal example, but I am working to a deadline, and got around this by not using HMC for now. NUTS works great, so I’m just using that for now. But thought I should let folks known in case this is some known bug.
That sounds very strange. Not sure this is Turing related though. You could try a different IDE and see if the problem still occurs.
To list a few: