Scripting like a Julian

I’ve been talking to a few people and since this topic comes up regularly, I thought I’d write up my workflow for writing “scripts”. I don’t have a standard blog or place to put this, so I thought I’d put it here and tag it as guide. If this is misplaced or inappropriate, please tell so that I can move it to some other, more permanent place.

What you’ll need

  • Texteditor of your choice
  • Revise.jl
  • That’s it!

Background

Scripting is usually done in a haphazard manner, just throwing things in a file of code in the approximate order they’re supposed to happen, pulling things in left and right until the thing does what you want it to do. This works well in interpreted languages like Bash, Zsh, Fish, Python, Perl, Ruby… but not in julia, as a lot of people notice. The reasons given are usually either “it’s hard to develop this code!” or “I don’t want to restart julia all the time, it takes too long”.

Well, about 90% of the julia code I write is in the form of these one-off scripts in a kind of “throwaway form”, as I call it. This workflow pretty much directly follows from the performance tips, so here’s how I do it and stay sane.

Optional Stuff

  1. Create a directory for your script to live in

This is useful because it makes sure that you can have multiple scripts next to each other without having to throw every dependency into your Main environment. It’s not necessary to do this though and you can certainly do this workflow without this step.

If you like to keep your main julia environment clean, it’s a good idea to also ]activate the directory your script lives in. If you do that, you probably want to modify your code loading a bit and use Pkg:

using Pkg 

Pkg.activate(dirname(@__FILE__))

# you've created this script interactively, so all these dependencies should already be installed in the environment of this script
using PackageA
using PackageB

# ... rest of the code

The Workflow

  1. Put everything into functions and don’t use global variables.

If you put your code into functions, you’ll immediately remedy one of the biggest problems of scripting: the need to reload your data. Putting the code that loads your data into a seperate function lets you execute just that function in the REPL and load your data once, allowing you to pass it to your “workhorse” functions later on without the need to reload it.

  1. Use a main() function as the main entry point into your code.

This function should be relatively small, just call the data loading function and pass the data directly into your working code:

using Whatever
using Package
using I
using Need

function data_loader(file="default_file.csv")
   # load & preprocess your data here
end

function workhorse_function(data)
   # where the main data munging/process happens
end

function main()
   workhorse_function(data_loader())
end

!isinteractive() && main() # this line is where the magic happens

The above is a little template/pattern I use all the time. I can include it in a regular session, I can includet it in a Revise session during development and I can also just call julia script.jl to run it directly. The magic is in the last line !isinteractive() && main() only runs main() if julia isn’t started interactively. Interactive sessions are Revise.jl, the REPL, Pluto.jl - those kinds of interactive workflows.

Of course, you can use many more worker and intermediate functions that are called in workhorse_function.

  1. Launch julia and either include or includet this file.

If I includet this file in a Revise session, I can load my data in the first step, process it once, change workhorse_function interactively and just run the workhorse function again (provided I didn’t modify the incoming data):

julia> data = data_loader("non_default_file.csv");

julia> workhorse_function(data) # some output, maybe

# change workhorse_function

julia> workhorse_function(data) # call it again, we're still good!

If the data loading step is cheap or I don’t have any data to load from disk, I usually just call main() directly, change it, and call it again and again, until I’m satisfied with the script.

The big win here is that things will only ever be loaded once. Compilation of dependencies included via using happens once. During development, you only load the data once (for iterating on the workhorse code - you may have to load it more than once if you have to munge your data first and need to check the function again and again). By reducing the number of things you do more than once, scripting in julia becomes really nice.


I hope this can be useful to some people to improve their workflow when working with “one-off” scripts :slight_smile: We all know they’re not really one-off, are they? :wink:

30 Likes

Oh yeah, I forgot to mention this:

Transitioning a “script” to a module is super easy: Simply remove the lines mentioning isinteractive(), wrap everything in a module A ... end (activate the environment and add all the dependencies, if you haven’t done that yet and remove the hard dependency on Pkg) and you’re done.

Developing structs is a little more involved - I usually use a NamedTuple until I’m happy with the fields I need and only then create a proper struct for everything. This makes it much easier to develop structs and code at the same time.

2 Likes
!isinteractive()

is the new

__name__ == “__main__”
1 Like

I wouldn’t say so :thinking: __main__ is always the top namescape in python, even if you start “worker processes” (since __name__ only changes if the file is imported into another file). If you’d spawn worker processes from an interactive session and provided them with the same script, isinteractive() would be false, while keeping the same namespaces.

But yes, insofar the purpose goes, it achieves the same goal :slight_smile:

The manual recommends checking if abspath(PROGRAM_FILE) == @__FILE__.

4 Likes

Yep, that’s a similar thing to do. Including a file with that instead of isinteractive has the consequence of immediately running whatever function you want to run on entry, which is something I don’t want. include in the workflow above is really most often used during the development of the script, so running it right away is not desired.

2 Likes