Nice workflows for using and developing in Julia 1.9+

I wrote a small step by step to help my students with of a good workflow for Julia here. Sharing it here because it might help new users. Updated (and properly formatted) versions will be available in that link when necessary.

Nice workflows for using and developing Julia 1.9+

This a brief description of some nice development workflows for Julia, particularly for versions 1.9 or greater. This workflows are usually fairly personal, so many other useful ones may exist.

juliaup

The juliaup tool allows an easy handling of Julia installations and versioning. In short, if you are using Linux, install juliaup with:

curl -fsSL https://install.julialang.org | sh

then, close the terminal, start it again. The juliaup and julia executables will be available in your path. By default, juliaup installs the latest stable version of Julia, which as of the writing of this text was 1.8.5. We want to work with the upcoming 1.9 series, so we start by installing it:

juliaup add 1.9

which will currently install, as of today, the 1.9.0-rc1 version of Julia. Then, lets make it the default Julia:

juliaup default 1.9

Revise and startup.jl

Revise.jl is a fundamental tool for Julia development and use. It allows one to track changes to files and, thus, to very effectively develop new functions, tune plots, etc, by editing an script in parallel to an active Julia section. Thus, add revise to your main environment:

julia> ] add Revise

(remembering that the ] will take you to the package manager prompt: (@v1.9) pkg>).

Next, let us guarantee that Revise is always loaded on startup. Add it to (or create the file)

~/.julia/config/startup.jl

and to it the line

using Revise

which will make Revise to be loaded on each Julia startup.

Why Revise

With Revise loaded, it is possible to edit/develop scripts simply modifying the script and re-running functions in an open Julia section. For example, given the script that
generates some data and then plots it:

using Plots
function my_data(n)
    data = randn(n)
    return data
end
function my_plot(data)
    plt = plot(data; label="My data"; linewidth=1)
    return plt
end

If we save the script in a myplot.jl file, and within Julia, we includet (note the t! - for “track”):

julia> includet("./myplot.jl")

julia> data = my_data(1000);

julia> my_plot(data)

we generate the plot. Then, without leaving the Julia REPL, we can change any property of the data or the plot in the script, save the file, and re-run the functions changed, and they will reflect automatically the updates to the file.

The video below illustrates such a feature, by changing the line width of the plot, and executing again the my_plot function.

!!! note
The video illustrates the use of Revise from within VSCode, which is also a recommended tool for an effective workflow, but is not required here nor will be discussed in this text. In any case, if you are using it, install the Julia extension.

!!! note
The example above illustrates some advantages of splitting Julia code into functions. With that layout, the function my_plot can be repeatedly executed
at the REPL, tracking the changes made on the file. The same could be done with the data-generation function, for example, if the data has to be reloaded
from different files, for example. Note, additionally, that it is good to structure Julia code in functions for performance reasons (functions get compiled
to efficient native code), although in this example that is a essentially irrelevant.

Environments

Julia 1.9 makes it particularly appealing to use environments for specific tasks, because the compiled code of the libraries gets stored in a environment-specific manner, making the load times of the libraries quicker than in previous Julia versions. Besides, the use of environments allows one to obtain completely reproducible setups. Let us take the previous script,
but we will load another large package DataFrames, and use it to store the sample data we are creating:

using Plots
using DataFrames
function my_data(n)
    data = DataFrame(:x => randn(n))
    return data
end
function my_plot(data)
    plt = plot(data.x; label="My data", linewidth=1)
    return plt
end

Creating and installing packages

Since Plots and DataFrames are relatively heavy packages, they take a while to install and compile. We will do that within an new environment. First, create a directory that will contain the environment files. We choose to save the environments within a ~/.JuliaEnvironments directory, but that is completely optional, environments are stored in regular directories:

mkdir ~/.JuliaEnvironments 
mkdir ~/.JuliaEnvironments/mydataplots

The mydataplots is the directory where the environment files will be automatically created.

Then, start Julia and

julia> ] # go to pkg prompt

(@v1.9) pkg> activate ~/.JuliaEnvironments/mydataplots
  Activating new project at `~/.JuliaEnvironments/mydataplots`

(mydataplots) pkg>

and note that the pkg> prompt reflects that the mydataplots environment is activated. We now add the necessary packages, which can take some minutes, depending on the internet connection and speed of the computer:

(mydataplots) pkg> add Plots, DataFrames
   Resolving package versions...
   ...

after the installation is finished, let us simulate the use of the packages for the first time, which may trigger additional compilation. Type backspace to go back to the Julia prompt, and do:

julia> using Plots, DataFrames
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
[ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]

which may also take some time (it it well possible that the packages don’t get precompiled, again, on this first using, but sometimes they are because
of dependency version updates).

That’s all for the installation part.

Using the environment

You can quit Julia, and let us move to the directory of the working script:

cd ~/Documents/mytestscript

Here we have the script.jl containing the code shown above, using Plots and DataFrames, as example packages.

Now start Julia, and activate the mydataplots environment, with:

julia> ] # go to pkg prompt

(@v1.9) pkg> activate /home/user/.JuliaEnvironments/mydataplots/
  Activating project at `~/.JuliaEnvironments/mydataplots`

(mydataplots) pkg>

type backspace go back to the Julia prompt, and include the script (here with includet, assuming that Revise is loaded by default):

julia> includet("./myscript.jl")

This should take now a couple of seconds. And the responsiveness of the function should be good:

julia> @time data = my_data(1000)
  0.002078 seconds (33 allocations: 18.031 KiB, 94.87% compilation time)
1000×1 DataFrame
  Row │ x          
      │ Float64    
──────┼────────────
    1 │ -2.41804
    2 │ -0.51387
    3 │  0.953752
    4 │  0.738998
    5 │  0.973528
  ⋮   │     ⋮
  997 │  0.707327
  998 │  0.200788
  999 │ -0.84872
 1000 │ -1.49911
   991 rows omitted

and

julia> @time my_plot(data)
  0.635311 seconds (2.83 M allocations: 172.703 MiB, 9.92% gc time, 99.49% compilation time: 72% of which was recompilation)

Thus, in a few seconds, the script can be completely run, avoiding usual delays of recompilation of the packages involved, which happened often in previous versions of Julia.

Automatic activation

Now, let us automate the activation of the environment, by adding to the top of the script the following first line:

import Pkg; Pkg.activate("/home/user/.JuliaEnvironments/mydataplots") # added line
using Plots
... # script continues

Now, when including the script, it will automatically activate that environment, and use the packages installed for it. It is even possible to just execute the script from the command-line with an acceptable performance, where the script now, shown below, contains the execution of the functions and saving the plot to a figure:

user@m3g:~/Documents/mytestscript% time julia myscript.jl 
  Activating project at `~/.JuliaEnvironments/mydataplots`

real	0m5,172s
...

The complete script is, now:

import Pkg; Pkg.activate("/home/user/.JuliaEnvironments/mydataplots")
using Plots
using DataFrames
function my_data(n)
    data = DataFrame(:x => randn(n))
    return data
end
function my_plot(data)
    plt = plot(data.x; label="My data", linewidth=1)
    return plt
end
data = my_data(1000)
plt = my_plot(data)
savefig(plt,"plot.png")

Of course, you can use the same environment for all scripts that require the
same set of packages, with the same benefits.

!!! note
It is not impossible that you get some recompilation of the packages
from time to time if, in particular, new packages are loaded in the
same environment. However, once the packages of the environment are
stable, precompilation should only occur when trying to use the same
environment in different version of Julia.

99 Likes

Nice workflow! sharing personnal usages, I tend to use “shared environment” (very lightly docummented for now) over the solution with paths you propose. For instance:

Pkg> activate @mynewsharedenv

As with “regular” (i.e. local) environments, this command will create a new environment if it does not exist, which will be located in ~/.julia/environments.

And to activate it, from anywhere, you just run the same command.

I hope this helps, as I found it simpler to do (especially for newcomers). In particular, you don’t need to remember the path, nor to type it.

EDIT: clarity

45 Likes

Yes, that’s true. I use them as well. The reason I don’t suggest them initially is that the environment files are less explicitly stored, and I find useful to have them in a separate place (from the Julia installation) to copy/share among machines.

3 Likes

Wow, I did not know about that.
I have been using --project=~/path/to/where/I/store/all/my/envs/myproject. Just using --project=@myproject will be much easier.

I do occasionally have scripts inside these, though.
But having scripts inside ~/.julia/environments/myproject seems like it ought to be fine, too.

3 Likes

I didn’t know about the shorthand macro, but I have been using the Pkg> active --shared mynewsharedenv for quite some time. It makes keeping per task environments much simpler and avoids long update times for environments with too many unnecessary packages. Especially useful in say VSCode where you are constantly working from different directories.

2 Likes

Sorry, I’m having a hard time understanding the precise motivations of using environments. Does using environments speed up loading time (I thought this was automatic in Julia 1.9 and that you have to use “–pkgimages=no” to disable it), or is it just to make sure your results are reproducible? Normally I don’t care THAT much about loading time, but loading time does become important with webservices.

EDIT: I am using “shared environment” wrong. I was thinking of environment stacking.

That, and shared environments can have incompatible dependencies loaded at the same time, so it’s probably for the best that they are not promoted front and center.

1 Like

It is true that 1.9 will try to compile and cache generated code much more than before, but if you add a new package to any environment (a specific one or the main one), that can trigger not only updates, but cause invalidations, that will trigger recompilation of perhaps a lot of packages in that environment.

Thus, if you have different types of tasks, involving different sets of packages, it is better to isolate each in a specific environment, in which packages are not added frequently for other external reasons. With that, you will benefit the most from the cashed code available for that environment and the specific packages and versions it is using.

With web services, on the other side, I would not only recommend using environments, but also not re-launching Julia repeatedly. For instance using GitHub - dmolina/DaemonMode.jl: Client-Daemon workflow to run faster scripts in Julia.
(I’m not sure how it interacts with environments, though).

2 Likes

For webservices I use docker and build the docker image with a PackageCompiler.jl image. It definitely solves the startup problem, but it is a bit complicated and it takes a long time to build the image. Maybe 1.9+ will make it so I don’t have to do this so much.

1 Like

The two biggest benefits:

  1. Avoid version conflicts. With environments focused on a task, they’re mostly a thing of the past, except when I’m explicitly updating packages’ version bounds.
  2. I often develop multiple branches/PRs in parallel. I have six instances of one package installed, each checked out to a different branch. This lets me work on multiple branches at the same time, avoid recompiling when switching between them, and also lets me compare between them (one of them is always the master/main branch).
14 Likes

I would add something incredibly useful feature that improves a workflow compared to Python. In the package view: activate --temp

It creates an environment that will consume no disk space, manages isolated dependencies as any environment, and basically gets deleted on exiting the session.

To sketch a few ideas, to run quick tests on a package in development, it is a key differentiator.

2 Likes

Indeed! Useful also if you’re using things like Literate.jl, when you want to execute the generated notebook and check everything works fine.

1 Like

Hmm, I didn’t know about that. I’ve check the content of the Project.toml and Manifest.toml files and they look regular. Same when checking with ]st.

Is this a behavior by design / what causes this? I’d be glad to know more.

I always suspected that @Elrod was not a regular human : he has 6 cores !

4 Likes

EDIT: I am using “shared environment” wrong. I was thinking of environment stacking.

See Shared environments

Basically, when you use the package manager to add/remove packages, it resolves with respect to the current environment, with no regards to the shared environments that might be visible. This is really for the best, otherwise things would become complicated.

But, as a result, when you using SomePackageFromSharedEnv (eg. Plots) that depends on SomePackageInCurrentEnvAndSharedEnv (eg. a common package like MacroTools), it will load the Plot’s version from the shared-env, but MacroTool’s version from the current env. There’s no guarantee that those two packages are compatible.

Long story short, shared envs are completely safe for dependency-less packages (eg. BenchmarkTools), but I wouldn’t want to rely on them for more complex packages, because it’s asking for trouble. Use dev environments for that.

Come to think of it, it wouldn’t cost much of the package manager to at least warn when incompatible packages versions are present in shared environments, and it’d avoid the problem.

1 Like

But is that all specific of shared environments? Wouldn’t be the same if I do:

pkg> ] activate "/path/to/env1"

julia> using PkgInEnv1

pkg> ] activate "path/to/env2"

julia> using PkgInvEnv2

?

Meaning, in both cases the good thing of using environments is not mixing them. Load one environment and work in it, shared or not.

Sigh, when I google for shared environments, the top links are our two discourse posts. Wish there was better docs for that. In any case, I’m referring to this behaviour:

(@v1.8) pkg> st
Status `~/.julia/environments/v1.8/Project.toml`
⌃ [6e4b80f9] BenchmarkTools v1.3.1
⌃ [187b0558] ConstructionBase v1.4.1
⌃ [a93c6f00] DataFrames v1.4.1
  [ae2dfa86] QuickTypes v1.8.0 `~/.julia/dev/QuickTypes`
⌃ [295af30f] Revise v3.4.0
  [1986cc42] Unitful v1.12.4
Info Packages marked with ⌃ have new versions available and may be upgradable.

julia> 
% julia --project=.   # brand new environment
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.5 (2023-01-08)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(cedric) pkg> st
Status `~/Project.toml` (empty project)

julia> using BenchmarkTools  # why does this work?

julia> 

Although my Manifest is empty, it still loaded BenchmarkTools from the shared env. This behaviour is convenient, but can be dangerous if one is not conscious of the implications.

But is that all specific of shared environments?

I don’t know, feels like the terminology around these things might not be fully established. I thought that the main point of shared envs was to put dev tools and such, so that they are available in all environments. And for that, it’s probably reasonably safe .

1 Like

That applies to the “main” shared environment only. You cannot load packages from other shared environments without explicitly activating them first, and in that sense they behave the same as any other environment, as far as I understand.

That is environment stacking, the docs are here:

https://docs.julialang.org/en/v1/manual/code-loading/#Environment-stacks

5 Likes