Julia has a strong point in being JIT-compiled language that incremental compilation works (it has to). Still, to take full advantage of that, saving state is required.
Imagine you have a codebase which takes 20 minutes to compile, then the part you want to fix. Ideally, you would run the code for 20 minutes, save the state, then just re-load the REPL state several times to test the new part.
How would you do that in practice?
What does “the part you want to fix” consist of? Can it not just be redefined in the running session? Maybe Revise.jl can make the workflow more convenient?
Or if the “codebase that takes 20 minutes to compile” is only about precompilation, maybe put that code in a package and put some good precompile instructions for it, then loading it should be fast in 1.9+?
With that said, a way to save the state could probably be convenient in many cases. Though I have no clue how hard it would be to implement, and I’m pretty sure it doesn’t currently exist.
It also depends what you mean by saving the REPL state:
- If you mean precompiled code, the easiest way is to use GitHub - JuliaLang/PrecompileTools.jl: Reduce time-to-first-execution of Julia code in your package.
- If you mean actual REPL results, you might want to enable the numbered prompt introduced in Julia 1.9
Old thread of mine on this topic.
Thanks for the link. Interesting read!
Since that conversation happened, 1.9 introduced changing the active module in the REPL. I wonder if this mitigates @anon56330260’s point that you can’t serialize Main because it is in constant flux.
A user could start by dropping into Main.Workspace
, import packages, define some functions, run a long-running process that constructs a DataFrame of results and assign it to df
, then jump back up to Main and serialize Main.Workspace
and exit the REPL.
When they open the REPL to work the next day, they deserialize Main.Workspace
and get all of their functions and df
back and don’t need to start from scratch.
This is a feature that R supports. You can save all the variables defined in the current session so you can come back tomorrow without rerunning your script. It should be awesome if, when I open a Pluto notebook, it was just ready for me to code instead of running the whole thing from the top again, potentially running very expensive computations.
See also Checkpointing with Julia
That thread was carried under the context of serialization. My point concerned about whether it’s possible to serialize Main in a stable, automatic and safe way, with a focus on compilation latency. Notice that these two concepts (compilation latency and serialization) are largely orthogonal. You can serialize julia values without saving binary codes, vice versa.
I think my conclusion is still no. Workspace is simply a module without package organization. They are constructed by toplevel include
instead of import
/using
. The mechanism is still the same as precompiled module. Programmers are required to take care of non-serializable values (handlers/Task) and prevent them from saving, manually. This is exact what I am talking about in that thread (it just works as long as you don’t do strange thing
):
If you are interested, you can actually read my slides Build System
on this topic, especially slide 14-15. https://docs.google.com/presentation/d/1wTP_nnQYiRLHYnrItrUnYsp7zX7q4vWMmhpONd8Ne_s/edit?usp=sharing I talk about this problem in depth.
A major difference is that R retains their functions in source code form, while Julia does not. While you might be able to recursively serialize the data structures in the current workspace, serializing JIT compiled functions does not make sense. JIT compiled code contains optimizations relevant only to JIT such as runtime pointer references which would be invalid when deserialized.
To get serializable native code, you help to get Julia into an mode where it is caching native code to disk. See julia --help-hidden
for options.
--compile={yes*|no|all|min}
Enable or disable JIT compiler, or request exhaustive or minimal compilation
--output-o <name> Generate an object file (including system image data)
--output-ji <name> Generate a system image data file (.ji)
--strip-metadata Remove docstrings and source location info from system image
--strip-ir Remove IR (intermediate representation) of compiled functions
--output-unopt-bc <name> Generate unoptimized LLVM bitcode (.bc)
--output-bc <name> Generate LLVM bitcode (.bc)
--output-asm <name> Generate an assembly file (.s)
--output-incremental={yes|no*}
Generate an incremental output file (rather than complete)
--trace-compile={stderr,name}
Print precompile statements for methods compiled during execution or save to a path
--image-codegen Force generate code in imaging mode
Technically R and Matlab can’t save arbitrary values. The same limitations apply to all languages.
You can actually do this by recursively caching all the functions and all the data that the functions referring to. Most of the time, this simply dumps everything in the enviorment. Note that to cache a data structure you must cache its type, but every type in Julia is a pointer already. You have such an impression because of the current way Julia implements it’s JIT compiler. But replacing pointer by mangled symbol name is totally possible.
I believe the main difference here is that in those languages interactive users seldom define their own types and functions. So many users never notice such things.
What about linux “hibernation” tools like criu
that dumps memory to disk to restore entire processes?