Workers do not show any IO when using a cluster

See the following code that works as expected.

julia> addprocs(4);

julia> pmap(x -> println(x), 1:8)
      From worker 3:    4
      From worker 5:    1
      From worker 5:    5
      From worker 3:    6
      From worker 5:    7
      From worker 3:    8
      From worker 4:    3
      From worker 2:    2

So on the main process, we see the “from worker” statements being printed. Now consider the following code using Slurm and ClusterManagers to connect to a multinode cluster.

addprocs(SlurmManager(512), N=16, topology=:master_worker, exeflags="--project=.")
julia> pmap(x -> println(x), 1:10)

You see nothing gets printed to the main process. Maybe a bug with ClusterManagers setting up the IO part of the processors? Or is it a bug from Julia’s side?

How do we debug this?

Note that in both results, pmap is working properly, i.e. it returns the output array (which in this case is nothing).

julia> pmap(x -> println(x), 1:10)
10-element Array{Nothing,1}:

but this is not what I am talking about… I am talking about possible println statements in the worker processes.

ClusterManagers redirects the output streams from the workers to individual files. Check the directory where you submitted your job from, you should have a bunch of job*.out files. These contain the output from the workers.

I think that’s *.out files are mostly for printing the host/port information for the workers. From what I can tell, it dosn’t redirect the standard out.

julia> pmap(x -> println(x), 1:500)
500-element Array{Nothing,1}:

here is the output from the .out file

[affans@hpc covid19abm]$ cat job0010.out
julia_worker:9798#[ip address retracted]```

The host is the first line that is written out. The output from the workers is buffered and written out at a later point, possibly when the julia session exits. You should definitely have the outputs once the job is over, such has been my experience. I suppose it might be possible to force the output to appear earlier by explicitly flushing the buffer.

Perfect. I killed my julia session and the buffer was finally flushed and written to the files.

However, is there a way to redirect stdio to the REPL instead of the files? Like in the original example where it prints out “from worker?”

Alternatively, is there a way to flush the buffer without killing the session?

Third question, more technical. There has to be some sort of maximum buffer size before it’s flushed and written on disk right?

You can flush the buffer using flush(stdout). An example is:

julia> wait(@spawnat 2 println(myid()))
Future(2, 1, 8, nothing)

# output is not written out at this point
shell> cat job-1808884-0000.out

julia> wait(@spawnat 2 flush(stdout))
Future(2, 1, 10, nothing)

# output has been written out
shell> cat job-1808884-0000.out
1 Like

Great thanks, this solves my problem. I think a ClusterManagers should support a keyword argument to where the io is printed. It’s extremely handing for debug purposes to have println statements in the worker code.