I am going to be creating my first project in Julia by writing and debugging in interactive sessions, but once I have it working the final code will run non-interactively on a HPC cluster.
I have learned the basics of Julia’s scope behaviors, but I’m sure I will make some mistakes in this area at first.
I know that some scope behaviors change when code is run interactively vs. non-interactively, and I’m concerned about how this might impact interactive testing/debugging of code that’s meant to eventually be non-interactive.
Is there an option or directive I can add in my code to say either “use non-interactive scope behaviors all the time” or “warn me if I do something that will behave differently in a non-interactive run”?
(And is “assignment to something with the same name as a global variable from within soft scope without specifying global/local” the only thing I need to watch out for, or are there also other behaviors that will change between interactive and non-interactive?)
include_string-ing a string, eval-ing an expression, or includeing a file (related to each other and all involve Core.eval) unconditionally treat the code inside with non-interactive rules. That’s what I do to get around the interactive rules.
And yes that’s what you need to watch out for when manually typing or pasting code into the REPL or IJulia (for notebooks), though it’s already plenty annoying for me to remember where soft scopes exist. The Julia 1.5 Highlights explains the reason for this interactive-noninteractive discrepancy in soft scopes with preexisting global variables. v1 started defaulting to new locals because it was safer in a language where a global scope can get huge and scattered across distant files, but the v0 soft scope was convenient for pasting local scope code into the REPL to inspect variables as persistent globals for (suboptimal) interactive debugging. Many new users came from Python, which superficially resembled the v0 rules because many of its structures like for loops don’t introduce new scope (which has its own gotcha) and was compensated by each file idiomatically being its own module, so IJulia developers reintroduced the v0 scoping for Jupyter users, leading to Julia developers bringing back v0 scoping to the REPL for consistency.
For interactive work, running notebook-style is fine, but if you’re going to scale to HPC, you’re not going to want to take the performance hit of having a bunch of non-constant globals flying around.
I highly recommend you start out early putting things into functions, where the scoping behavior is consistent. But even if you don’t want to do that, you can test the final product by wrapping the whole thing in
function main()
# all your interactive stuff
end
main()
90% of the code will be in functions as soon as it’s written, but there will be some code to control the process (read input data, call functions) and that’s the part I’m planning to run interactively at first.
I’m running interactively because each step (read several gigabytes of input data, first function doing its thing and getting the data ready for later steps, etc.) is going to take several minutes, and I don’t want to have to repeat earlier steps while testing later steps. Running interactively, I can make an extra “clean” copy of the data at the end of step 3, and then if something goes wrong with the function for step 4 I can revert to the clean version of the data and test that function again without having to repeat steps 1-3.
There should never be code that tries to change the binding of a name that’s pointing at one of the large global objects, those should always be mutated in place. But there will be a few small global variables that go with the big ones that and have names like num_elig, and I could imagine somehow shooting myself in the foot with unexpected scope behaviors with those, especially since parts of my main code that controls when/how functions are called will use loops.
(Although now that I think of it, if I manually run a certain line instead of running the loop that it’s in, that will also affect scope behavior, since loops have scope in Julia… I’ve brought a lot of habits with me from R that will probably need to be changed/replaced now! I think “learn about debugger options in Julia” just moved to a much higher position on my todo list.)
When I’m done with the interactive testing, the main control code (read input data, call functions) can also be moved into a function so there won’t be global variables anymore.
Two questions:
Are there ways to improve the way I’m planning to work on this that would be able to achieve the goal of “testing the function that does step 4 without having to re-run steps 1-3 each time” without risking odd behavior related to global variables? (Maybe something involving modules, but I haven’t figured out what/how yet.)
Are global variables the only thing the interactive and non-interactive behavior differs for? No other quirks that I haven’t learned about yet, especially anything related to nested local scopes?
And thanks for the explanation of how that came to be!
Are there any other quirks I should be aware of (scope-related or otherwise) that might cause code to behave differently in interactive vs. non-interactive runs?
Keep your steps in functions, interactively pass the persistent data in and out of them. The non-interactive code would just be another function that calls them in order.
You’ll probably want a few persistent global variables to keep data alive in the REPL, but not all of your interactive variables have to be kept alive. When possible, run your code in a let block where temporary local variables go away at the end. Bear in mind the let block has a hard scope so there’s no interactive-noninteractive discrepancy.
Another quirk is that in repl mode InteractiveUtils is loaded automatically. This is a package offering reflections and functions to inspect lowered code. If you simple run the script you will have to add using InteractiveUtils.
Unfortunately my “small” test datasets are 10-20 Gb, and if I ever have to troubleshoot it on real data that will be 40-80 Gb.
With data that size, I’m pretty motivated to keep everything in RAM and avoid disk I/O (unless I want an excuse to go take a lunch break). Fortunately the compute nodes I run interactive sessions on have around 250 Gb RAM so I can do that.
Nice!! num_elig does need to change across steps, but some of the others don’t, so that will be useful!
The loops will be in functions for the final code - but they’re the part I’ll want to run interactively at first, so I can manually do certain steps multiple times to test functions.
Several of the suggested solutions here could work for what I need!
For anyone who might find their way to this thread later because they have a similar question, I’ll add one more option that I found: Debugger.jl. The code you want to run interactively can stay in a function, and you use Debugger.jl to step into the function so you don’t have to temporarily make anything global to interactively test the code. There’s also a Julia debugger for VSCode if you’re running Julia on your local machine (but the VSCode debugger probably isn’t an option if you run Julia on a compute node and connect with ssh).