Workflow question - how to guarantee no dependence on global state without long load times?

This is a question about workflow. I like to code in an exploratory, iterative way, but I generally prefer not to use REPL or notebook environments, because I like to keep everything in a state where even if I get interrupted tomorrow and have to come back to it 6 months later, I can get it running again straight away, without having to remember what I did. (This happens to me regularly.)

In Python, I do this by putting everything in a file test.py, and then just call python3 test.py at a shell prompt. The code in test.py typically imports matplotlib, does some calculations, then pops up a plot window showing the results. When doing things this way, it’s absolutely guaranteed that my code is not accidentally referring to any global state and will run in exactly the same way next time, as long as I use the same version of Python.

I’d like to achieve a similar workflow in julia. However, it’s made complicated by the fact that starting julia and importing Plots takes a good 30 seconds, and Plots doesn’t seem designed to support this kind of workflow. So I am wondering if it’s possible to import/run my file in the REPL, in such a way that it’s guaranteed to be unable to access any global state, without having to wait for Plots to be imported every time it’s run.

I know that I can put my code in a module, and that when I re-include it the module’s namespace will be reinitialised. However, this isn’t enough, because if I’ve understood correctly, this only applies to global variables in my module’s namespace and not other modules’ namespaces. So if some previous iteration of my code has changed some global state stored in Plots or some other module, the current version might not run the same way if the REPL is restarted.

Is there a way to achieve what I’m asking for? Or am I asking the wrong question somehow? How do julia users handle this kind of repeatability issue, in general?

include("test.jl")

You can then change the file and include it again.

1 Like

And building on this, you can use Revise.jl to make it update the file each time you save it. After awhile, just setup the code as a package, and then using MyPackage and editing the package (with Revise) is a nice way to keep the code.

2 Likes

Right, but my question is, does this guarantee that my code can’t accidentally rely on global state? I know that my module namespace will be re-initialised when I call include, but can I absolutely guarantee that no other global state will be accessed? (Such as, for example, variables that are stored in the namespaces of other modules.)

I guess the question is kind of a subtle one, but I really care that my code is guaranteed to work in exactly the same way if I run it again in a new REPL session. That means there can be absolutely no possibility of accidentally referring to any state of any kind left by previous runs. My question is about whether the “put everything in a module and call include from the REPL” workflow can achieve that, and if not, whether there’s another one that can.

(The ideal way is just to run my code as a script, starting a new julia process every time. However, the 30-second import time for Plots makes this impractical, so I’m asking if there is a workaround.)

If it’s a package, then setup some test scripts. That is guaranteed to be ran in a clean REPL.

3 Likes

I would personally recommend using PackageCompiler to generate yourself a sysimg which adds the functions from Plots et. al that you need, and then just keep doing the same sort of approach that you use with python (except you run the script with this new Julia sysimg, so load times are cut drastically).

1 Like
julia> @time using Plots
  7.595079 seconds (11.91 M allocations: 648.866 MiB, 8.30% gc time)
2 Likes

that’s just the import time, the first actual plot will take longer to show up due to re-compilation.

So, no, using Revise.jl or including your module over and over cannot guarantee this. On the other hand, it’s impossible.

Your computer is full of mutable state (e.g. the file system, the internet, etc.). The Python example doesn’t achieve this kind of guarantee, since any piece of code could potentially read or write files to disk or send them over the network. It is easier to get closer to what you’re asking for in Python, since the startup times are often faster. The PackageCompiler solution should help you resolve the Plots startup time issue and is probably the closest to what you’re asking for.

4 Likes

I often hear people recommending this, but every single time I’ve tried to use PackageCompiler it fails on various points. Currently, PackageCompiler does not pass it’s own tests and has a long list of issues where people report that it just doesn’t work. Does anyone actually succeed with this building their sysimage using this?

Fair point.

I should say that I think an important root for this question is the discussion here: https://github.com/JuliaPlots/Plots.jl/issues/1209 . I’m also very happy to have pointed out there if people think I’m wrong.

I think it’s a little unclear what a guarantee means. To have your code not depend on global state is easy - don’t define any global variables. Of course your methods are global, so - if you’ve defined methods you don’t want to be used anymore you may have to restart. But that doesn’t necessarily require a workflow where you restart julia every time you run a script, where the load time of Plots becomes an issue.

I’ll say - the absolutely number # 1 most commonly asked question by newcomers to Julia is “how do I use Julia to have exactly the workflow I use in language X?” (number # 2 is, “what plotting package should I use and wth is up with the time-to-first-plot?”). Which is natural, as language-X-users will tell each other that theirs is the best of all possible workflows. Julia supports many different workflows, but I still think it’s useful to realise that a workflow is actually an integral part of a language. And it can be useful to give the standard workflow a chance when trying a new language.

2 Likes

I’ve made a system image with Plots several times with PackageCompiler, without any issues.

1 Like

Personally, I think it is quite valid to ask how to accomplish a particular workflow, especially when transitioning from another language. In fact, the very reason I’m asking about it is because you were dismissive in the other thread, saying it was a “workflow issue” and not a problem with Plots. If working in this way can’t be done then fine, but there’s no harm in asking, nor in letting developers know that users exist who want to work that way. In interacting with you I feel that I’m just being attacked for asking the question, which I have to say, doesn’t make me feel welcome.

PackageCompiler sounds like a perfect solution if I can get it to work - I will either try that next or give the REPL workflow a try, I’m not sure which.

1 Like

I’m sorry you feel that way. I think I may be a little extra prickly on that issue I linked, given that people downvote everything I say. And if you read the issue again you will see that it’s mostly presented as a critique of Plots, and people tend to be sensitive to critique on work they do for free. Anyway I do think both of those questions are completely valid, I was just trying to offer a different perspective.

2 Likes

I have reread this topic and don’t think that your impression is warranted (but I recognize that these things are subjective). I think you had a lot of helpful replies.

As you noted, the kind of workflow you outline in the first post is currently not the most convenient one in Julia, and it is unlikely to ever be the recommended one — even if compilation times improve, they will be there as Julia is AOT compiled. This is a trade-off inherent in the design of the language, and the feature that gets you fast execution times (after things are compiled).

Improving compilation times is WIP, but long time to first plot is unfortunately not something that can be eliminated now. You can try using a backend, eg GR or Gadfly, directly, which will load faster.

2 Likes

In terms of the script-running workflow you can easily do that by saving pngs of the plots from your script (though that will not solve the long time-to-first-plot issue, unfortunately). It’s the specific workflow of running a script that opens plots windows, then lets them stay open after finalising, that is not supported by Plots. I can see how that may be useful to some, and there is now an open issue requesting that. Anyone is very welcome to implement it and add a PR.

1 Like

My apologies also, I was prickly too, and certainly didn’t mean to say your replies weren’t helpful! I learned a lot from this discussion.

4 Likes

Yes:

That sysimg is the result of PackageCompiler.compile_incremental(:Plots). Only important point is that you’ve run pkg> dev PackageCompiler beforehand.

2 Likes

If your most important point is “guaranteed runnability 6 months later”, you should check out Nextjournal!

You can work in it like in an IJulia notebook, and once you are done, it will freeze your complete state into a docker image. So it will be exactly the same when you come back to it 6 months later.

As it happens, I also prepared a few statically compiled system images with fast startup times for e.g. Plots.jl:
https://nextjournal.com/julia (you can just remix those, and then you’ll get the exact same setup).

I’ve been saved by Nextjournal multiple times, because I could go back to some of my old articles for packages that got into a broken state locally :wink:

It’s also a great way to find performance regressions. Sometimes I’m thinking “hey this was much faster a few months ago” - and then I was able to check it with some old notebook, to be sure that I didn’t just imagine things.

3 Likes

Thanks, that actually works here as well if I use Julia v1.1. Unfortunately not for Julia v1.2, but I guess I can’t really count on that yet :stuck_out_tongue: