I’ve been talking to a few people and since this topic comes up regularly, I thought I’d write up my workflow for writing “scripts”. I don’t have a standard blog or place to put this, so I thought I’d put it here and tag it as guide
. If this is misplaced or inappropriate, please tell so that I can move it to some other, more permanent place.
What you’ll need
- Texteditor of your choice
- Revise.jl
- That’s it!
Background
Scripting
is usually done in a haphazard manner, just throwing things in a file of code in the approximate order they’re supposed to happen, pulling things in left and right until the thing does what you want it to do. This works well in interpreted languages like Bash, Zsh, Fish, Python, Perl, Ruby… but not in julia, as a lot of people notice. The reasons given are usually either “it’s hard to develop this code!” or “I don’t want to restart julia all the time, it takes too long”.
Well, about 90% of the julia code I write is in the form of these one-off scripts in a kind of “throwaway form”, as I call it. This workflow pretty much directly follows from the performance tips, so here’s how I do it and stay sane.
Optional Stuff
- Create a directory for your script to live in
This is useful because it makes sure that you can have multiple scripts next to each other without having to throw every dependency into your Main environment. It’s not necessary to do this though and you can certainly do this workflow without this step.
If you like to keep your main julia environment clean, it’s a good idea to also ]activate
the directory your script lives in. If you do that, you probably want to modify your code loading a bit and use Pkg
:
using Pkg
Pkg.activate(dirname(@__FILE__))
# you've created this script interactively, so all these dependencies should already be installed in the environment of this script
using PackageA
using PackageB
# ... rest of the code
The Workflow
- Put everything into functions and don’t use global variables.
If you put your code into functions, you’ll immediately remedy one of the biggest problems of scripting: the need to reload your data. Putting the code that loads your data into a seperate function lets you execute just that function in the REPL and load your data once, allowing you to pass it to your “workhorse” functions later on without the need to reload it.
- Use a
main()
function as the main entry point into your code.
This function should be relatively small, just call the data loading function and pass the data directly into your working code:
using Whatever
using Package
using I
using Need
function data_loader(file="default_file.csv")
# load & preprocess your data here
end
function workhorse_function(data)
# where the main data munging/process happens
end
function main()
workhorse_function(data_loader())
end
!isinteractive() && main() # this line is where the magic happens
The above is a little template/pattern I use all the time. I can include
it in a regular session, I can includet
it in a Revise session during development and I can also just call julia script.jl
to run it directly. The magic is in the last line !isinteractive() && main()
only runs main()
if julia isn’t started interactively. Interactive sessions are Revise.jl, the REPL, Pluto.jl - those kinds of interactive workflows.
Of course, you can use many more worker and intermediate functions that are called in workhorse_function
.
- Launch
julia
and eitherinclude
orincludet
this file.
If I includet
this file in a Revise session, I can load my data in the first step, process it once, change workhorse_function
interactively and just run the workhorse function again (provided I didn’t modify the incoming data):
julia> data = data_loader("non_default_file.csv");
julia> workhorse_function(data) # some output, maybe
# change workhorse_function
julia> workhorse_function(data) # call it again, we're still good!
If the data loading step is cheap or I don’t have any data to load from disk, I usually just call main()
directly, change it, and call it again and again, until I’m satisfied with the script.
The big win here is that things will only ever be loaded once. Compilation of dependencies included via using
happens once. During development, you only load the data once (for iterating on the workhorse code - you may have to load it more than once if you have to munge your data first and need to check the function again and again). By reducing the number of things you do more than once, scripting in julia becomes really nice.
I hope this can be useful to some people to improve their workflow when working with “one-off” scripts We all know they’re not really one-off, are they?