Parallel program with external software works from REPL but not from command line

PlasmaJoe · July 8, 2020, 10:13pm

I’m trying to set up Julia to interact with an external program. I need to run the external program a lot and I have access to a cluster so since the external program only uses one core I think it makes sense to distribute the work by running separate instances on several cores at once.

I’m using “@spawnat” to call a Julia function that launches the external program. The main program starts the workers:

### mre.jl main program
using Distributed
using HDF5

nprocs = 3

if nprocs() < NP
    addprocs(NP-nprocs())
end

@everywhere include("/Users/me/demo.jl")
Threads.@threads for i = 1:100
    f = @spawnat :any demo(i)
end

The secondary function launches the external program (bolsigminus) and saves the data:

function demo(x)
    input = "/Users/me/new_input_deck.dat"
    run(`./bolsigminus $input`)
    h5write("/Users/me/demo"*string(x)*".h5", "Data/xs", x)
end

When I run the main program from the REPL it works perfectly. When I run the main program from the command line using julia -p 3 mre.jl the external program launches on each worker but just stops and exits partway through on the first execution without any error messages. I want it to work from the command line because I’m going to have to submit these to the queue on the cluster.

I’m a newbie so I apologize if the answer is obvious, but I’m baffled. Any suggestions are greatly appreciated!

ppalmes · July 8, 2020, 10:25pm

no need to spawn because anything inside the thread is already run in a separate thread.

ppalmes · July 8, 2020, 10:26pm

you may be referring to @distributed instead of @threads. in any case, no need to use spawn.

PlasmaJoe · July 8, 2020, 11:54pm

Thank you for the coaching! I’ve never worked with any real distributed code before and I’m still at the start of the learning curve. I tried changing the main call to

@distributed for i = 1:NP-1
    demo(i)
end

Still works perfectly from REPL and fails from the command line.

ppalmes · July 9, 2020, 5:02am

try running serially if it works. maybe the error is not in the parallelism but the code inside the loop has some issues. try ordinary for loop first.

PlasmaJoe · July 9, 2020, 5:06pm

Using a normal loop (just removing “@distributed”) from in front of the for loop works fine in both REPL and command line.

ppalmes · July 9, 2020, 5:46pm

try
@everywhere function demo(x)
…
end

also
@everywhere using HDF5

instead of including the demo function, place it in the same file with your mre.jl

PlasmaJoe · July 10, 2020, 5:13pm

Adding “@sync” before “@distributed” resolved the issue. I’m happy that it is running but still at a loss to understand why “@sync” was needed from the command line and not from the REPL.

Topic		Replies	Views
Multithreads in REPL New to Julia	2	420	April 22, 2022
Spawn-fetch usage General Usage parallel	16	3554	December 7, 2017
@spawnat and remotecall equivalence? New to Julia	2	564	September 5, 2019
How to handle code availability in a parallel-computing-enabled package General Usage question	3	690	December 31, 2017
Threading in Julia General Usage question	4	1692	April 3, 2017

Parallel program with external software works from REPL but not from command line

Related topics