Save/load REPL state

Tarny_GG_Channie · August 3, 2023, 4:07am

Julia has a strong point in being JIT-compiled language that incremental compilation works (it has to). Still, to take full advantage of that, saving state is required.
Imagine you have a codebase which takes 20 minutes to compile, then the part you want to fix. Ideally, you would run the code for 20 minutes, save the state, then just re-load the REPL state several times to test the new part.
How would you do that in practice?

albheim · August 3, 2023, 6:17am

What does “the part you want to fix” consist of? Can it not just be redefined in the running session? Maybe Revise.jl can make the workflow more convenient?

Or if the “codebase that takes 20 minutes to compile” is only about precompilation, maybe put that code in a package and put some good precompile instructions for it, then loading it should be fast in 1.9+?

With that said, a way to save the state could probably be convenient in many cases. Though I have no clue how hard it would be to implement, and I’m pretty sure it doesn’t currently exist.

gdalle · August 3, 2023, 7:13am

It also depends what you mean by saving the REPL state:

If you mean precompiled code, the easiest way is to use GitHub - JuliaLang/PrecompileTools.jl: Reduce time-to-first-execution of Julia code in your package.
If you mean actual REPL results, you might want to enable the numbered prompt introduced in Julia 1.9

Benny · August 3, 2023, 7:50am

Old thread of mine on this topic.

mrufsvold · August 3, 2023, 9:06am

Thanks for the link. Interesting read!

Since that conversation happened, 1.9 introduced changing the active module in the REPL. I wonder if this mitigates @anon56330260’s point that you can’t serialize Main because it is in constant flux.

A user could start by dropping into Main.Workspace, import packages, define some functions, run a long-running process that constructs a DataFrame of results and assign it to df, then jump back up to Main and serialize Main.Workspace and exit the REPL.

When they open the REPL to work the next day, they deserialize Main.Workspace and get all of their functions and df back and don’t need to start from scratch.

This is a feature that R supports. You can save all the variables defined in the current session so you can come back tomorrow without rerunning your script. It should be awesome if, when I open a Pluto notebook, it was just ready for me to code instead of running the whole thing from the top again, potentially running very expensive computations.

stevengj · August 3, 2023, 12:11pm

See also Checkpointing with Julia

anon56330260 · August 3, 2023, 10:29pm

That thread was carried under the context of serialization. My point concerned about whether it’s possible to serialize Main in a stable, automatic and safe way, with a focus on compilation latency. Notice that these two concepts (compilation latency and serialization) are largely orthogonal. You can serialize julia values without saving binary codes, vice versa.

I think my conclusion is still no. Workspace is simply a module without package organization. They are constructed by toplevel include instead of import/using. The mechanism is still the same as precompiled module. Programmers are required to take care of non-serializable values (handlers/Task) and prevent them from saving, manually. This is exact what I am talking about in that thread (it just works as long as you don’t do strange thing):

What is harder about saving a Julia session via the REPL whenever?

So I guess what you want is like this:

Execute some Julia codes in REPL, which is some function calls to other libraries, for example, Plots.plot.

The compiled binary codes and other necessary serializable runtime metadata are cached.

The next time you open REPL, you re-execute you script to set up those runtime you don’t want to cache, and then these compiled codes are loaded. You don’t need to recompile them, so a lot of time is saved.

If this is what you want, then it can be achieved (as long as you don’t save the global and definitions, you can always reexecute them for they take less time compared to compilation). I have a private fork of Julia’s compiler for this kind of binary cache, which utilizes LLVM’s new linking architecture. It indeed works much like what you just said in this thread. And it has all the limitations I mentioned above, it just works as long as you don’t do strange things.

If you are interested, you can actually read my slides Build System on this topic, especially slide 14-15. https://docs.google.com/presentation/d/1wTP_nnQYiRLHYnrItrUnYsp7zX7q4vWMmhpONd8Ne_s/edit?usp=sharing I talk about this problem in depth.

mkitti · August 3, 2023, 11:35pm

A major difference is that R retains their functions in source code form, while Julia does not. While you might be able to recursively serialize the data structures in the current workspace, serializing JIT compiled functions does not make sense. JIT compiled code contains optimizations relevant only to JIT such as runtime pointer references which would be invalid when deserialized.

To get serializable native code, you help to get Julia into an mode where it is caching native code to disk. See julia --help-hidden for options.

 --compile={yes*|no|all|min}
                          Enable or disable JIT compiler, or request exhaustive or minimal compilation

 --output-o <name>        Generate an object file (including system image data)
 --output-ji <name>       Generate a system image data file (.ji)
 --strip-metadata         Remove docstrings and source location info from system image
 --strip-ir               Remove IR (intermediate representation) of compiled functions

 --output-unopt-bc <name> Generate unoptimized LLVM bitcode (.bc)
 --output-bc <name>       Generate LLVM bitcode (.bc)
 --output-asm <name>      Generate an assembly file (.s)
 --output-incremental={yes|no*}
                          Generate an incremental output file (rather than complete)
 --trace-compile={stderr,name}
                          Print precompile statements for methods compiled during execution or save to a path
 --image-codegen          Force generate code in imaging mode

anon56330260 · August 4, 2023, 12:25am

Technically R and Matlab can’t save arbitrary values. The same limitations apply to all languages.

You can actually do this by recursively caching all the functions and all the data that the functions referring to. Most of the time, this simply dumps everything in the enviorment. Note that to cache a data structure you must cache its type, but every type in Julia is a pointer already. You have such an impression because of the current way Julia implements it’s JIT compiler. But replacing pointer by mangled symbol name is totally possible.

I believe the main difference here is that in those languages interactive users seldom define their own types and functions. So many users never notice such things.

pxshen · July 22, 2024, 7:51pm

What about linux “hibernation” tools like criu that dumps memory to disk to restore entire processes?

Topic		Replies	Views
REPL Save/restore General Usage repl , replicability	5	133	July 17, 2025
What is harder about saving a Julia session via the REPL whenever? Internals & Design sysimage	10	1488	July 26, 2022
Save complete session snapshot? Tooling	11	3126	May 27, 2019
Reuse Compiled Code Between REPLs? New to Julia repl , precompilation	10	282	July 19, 2024
Workflow question - how to guarantee no dependence on global state without long load times? New to Julia	19	1015	May 27, 2019

Save/load REPL state

Related topics