I’ve been talking to a few people and since this topic comes up regularly, I thought I’d write up my workflow for writing “scripts”. I don’t have a standard blog or place to put this, so I thought I’d put it here and tag it as
guide. If this is misplaced or inappropriate, please tell so that I can move it to some other, more permanent place.
What you’ll need
- Texteditor of your choice
- That’s it!
Scripting is usually done in a haphazard manner, just throwing things in a file of code in the approximate order they’re supposed to happen, pulling things in left and right until the thing does what you want it to do. This works well in interpreted languages like Bash, Zsh, Fish, Python, Perl, Ruby… but not in julia, as a lot of people notice. The reasons given are usually either “it’s hard to develop this code!” or “I don’t want to restart julia all the time, it takes too long”.
Well, about 90% of the julia code I write is in the form of these one-off scripts in a kind of “throwaway form”, as I call it. This workflow pretty much directly follows from the performance tips, so here’s how I do it and stay sane.
- Create a directory for your script to live in
This is useful because it makes sure that you can have multiple scripts next to each other without having to throw every dependency into your Main environment. It’s not necessary to do this though and you can certainly do this workflow without this step.
If you like to keep your main julia environment clean, it’s a good idea to also
]activate the directory your script lives in. If you do that, you probably want to modify your code loading a bit and use
using Pkg Pkg.activate(dirname(@__FILE__)) # you've created this script interactively, so all these dependencies should already be installed in the environment of this script using PackageA using PackageB # ... rest of the code
- Put everything into functions and don’t use global variables.
If you put your code into functions, you’ll immediately remedy one of the biggest problems of scripting: the need to reload your data. Putting the code that loads your data into a seperate function lets you execute just that function in the REPL and load your data once, allowing you to pass it to your “workhorse” functions later on without the need to reload it.
- Use a
main()function as the main entry point into your code.
This function should be relatively small, just call the data loading function and pass the data directly into your working code:
using Whatever using Package using I using Need function data_loader(file="default_file.csv") # load & preprocess your data here end function workhorse_function(data) # where the main data munging/process happens end function main() workhorse_function(data_loader()) end !isinteractive() && main() # this line is where the magic happens
The above is a little template/pattern I use all the time. I can
include it in a regular session, I can
includet it in a Revise session during development and I can also just call
julia script.jl to run it directly. The magic is in the last line
!isinteractive() && main() only runs
main() if julia isn’t started interactively. Interactive sessions are Revise.jl, the REPL, Pluto.jl - those kinds of interactive workflows.
Of course, you can use many more worker and intermediate functions that are called in
includet this file in a Revise session, I can load my data in the first step, process it once, change
workhorse_function interactively and just run the workhorse function again (provided I didn’t modify the incoming data):
julia> data = data_loader("non_default_file.csv"); julia> workhorse_function(data) # some output, maybe # change workhorse_function julia> workhorse_function(data) # call it again, we're still good!
If the data loading step is cheap or I don’t have any data to load from disk, I usually just call
main() directly, change it, and call it again and again, until I’m satisfied with the script.
The big win here is that things will only ever be loaded once. Compilation of dependencies included via
using happens once. During development, you only load the data once (for iterating on the workhorse code - you may have to load it more than once if you have to munge your data first and need to check the function again and again). By reducing the number of things you do more than once, scripting in julia becomes really nice.
I hope this can be useful to some people to improve their workflow when working with “one-off” scripts We all know they’re not really one-off, are they?