I am looking for some help understanding what is probably a familiar problem to those who run Julia regularly on an HPC with Slurm. I’ll explain the issue and then share what I’ve found. Hopefully someone can help me interpret this and understand the issue.
I have a long running script instrumented with
@infos so I can see information like a summary of the run echoed at the beginning and then regular progress updates (say once every 1000 iterations). I’d like these output to a file during the run.
My slurm script contains the following directives
#SBATCH --job-name=my_run #SBATCH --output=my_run.o-%j #SBATCH --error=my_run.e-%j
I understand that the
@info statements are getting output to
stderr and indeed when the run finally finishes I see the output in the
my_run.e- file. But this doesn’t solve my problem as I’d like to see it during the run to check on how it is going.
From what I have read here and here it seems this is an issue with Slurm buffering. The buffer doesn’t get filled and so it waits till the end of the run to flush it, something like that (please correct me if I am wrong).
My first attempt to fix this issue followed the approach described here. I added the following function to my package and then call it before starting the main function.
logger = TeeLogger( MinLevelLogger(FileLogger(logfile), Logging.Debug), MinLevelLogger(ConsoleLogger(stderr, Logging.BelowMinLevel), Logging.Info) ) global_logger(logger)
I understand that
FileLogger is supposed to flush every time a message is written. That doesn’t appear to be the case though. The log file gets created but I never see anything show up until the end of the run. Also, it seems
ConsoleLogger also does not automatically flush with each message (I didn’t see any kwargs described that might enable it to, did I miss one?).
I also tried what was described here and submitted my run as follows
stdbuf -i0 -o0 -e0 sbatch myrun.slurm
But that didn’t make a difference either.
Finally, what did work was after every
@info I added a
flush(stderr) and now finally it writes to the
my_run.e- file in real time. I’m a bit disappointed though as adding that after every single
@info is cumbersome. Is that really the best way to do this? Or is there a better/smarter way to make it so every single
@info automatically is flushed?
Another alternative I could define my own function I guess that does the following and then replace all the
function print_info(msg) @info "My run info is %s" msg flush(stderr) end