In this GitHub comment @StefanKarpinski wrote
I’ve often wanted the ability to snapshot a running program and be able to restart the snapshot – multi-shot continuations but for programs, if you will – but it seems to be technically nearly impossible to do this because of the large amount of program state which cannot be persisted or restored across processes.
I am interested in learning more about this. Is it possible to list all the components that make up the state of a Julia program? Especially with an eye toward which of these can be serialized and (exactly or nearly) restored?
It’s less about Julia programs specifically and more about processes in general.
If I make a program that’s a GUI with an on/off switch, it’s easy to serialize the state of the program: I can do it in one bit. Julia needs more, in order to track all of the objects defined. Suppose I’m in a repl and I want to save it to disk, reboot, and restore. What are the data structures I need to save to avoid recompiling or recomputing anything and be able to proceed?
Or, how much of the repl state is it possible to serialize and reload?
Julia objects aren’t the issue — they’re fairly easy to serialize and deserialize. The problem is stuff like open file handles, mmapped files, network sockets, etc. What do you do if you suspend a program that has all of those and then you restore it? Should you reopen all the files? Re-open network connections? What if the file is gone? What if it has different contents than before? Do you seek to the same location? I believe there have been some projects to try to make this sort of thing work, but it’s a lot to work through and the way operating systems work is almost designed to make it hard.
For the comprehensive version, see rr.
A GUI program is actually a good example. Let’s say you’re using Qt to write a GUI app. When you use the Qt API to create a window with some widgets in it, it interacts with the windowing system to paint a bunch of stuff on the screen and setup handlers for events. If you stop that program and start it up again, it won’t just work — you would just have a program sitting listening for events that are never going to happen because there’s no window for the user to interact with and no event handlers registered anymore. To restore the program to working state, you’d have to replay all or some of the past interaction with the windowing system to get things back into the state that it was most recently in and recreate the windows, graphics, widgets, event handlers, etc. It would be possible to create a GUI API that allows saving and restoring GUI state, but none of them do, as far as I’m aware.