Parallel program with external software works from REPL but not from command line

I’m trying to set up Julia to interact with an external program. I need to run the external program a lot and I have access to a cluster so since the external program only uses one core I think it makes sense to distribute the work by running separate instances on several cores at once.

I’m using “@spawnat” to call a Julia function that launches the external program. The main program starts the workers:

### mre.jl main program
using Distributed
using HDF5

nprocs = 3

if nprocs() < NP
    addprocs(NP-nprocs())
end

@everywhere include("/Users/me/demo.jl")
Threads.@threads for i = 1:100
    f = @spawnat :any demo(i)
end

The secondary function launches the external program (bolsigminus) and saves the data:

function demo(x)
    input = "/Users/me/new_input_deck.dat"
    run(`./bolsigminus $input`)
    h5write("/Users/me/demo"*string(x)*".h5", "Data/xs", x)
end

When I run the main program from the REPL it works perfectly. When I run the main program from the command line using julia -p 3 mre.jl the external program launches on each worker but just stops and exits partway through on the first execution without any error messages. I want it to work from the command line because I’m going to have to submit these to the queue on the cluster.

I’m a newbie so I apologize if the answer is obvious, but I’m baffled. Any suggestions are greatly appreciated!

no need to spawn because anything inside the thread is already run in a separate thread.

you may be referring to @distributed instead of @threads. in any case, no need to use spawn.

Thank you for the coaching! I’ve never worked with any real distributed code before and I’m still at the start of the learning curve. I tried changing the main call to

@distributed for i = 1:NP-1
    demo(i)
end

Still works perfectly from REPL and fails from the command line.

try running serially if it works. maybe the error is not in the parallelism but the code inside the loop has some issues. try ordinary for loop first.

Using a normal loop (just removing “@distributed”) from in front of the for loop works fine in both REPL and command line.

try
@everywhere function demo(x)

end

also
@everywhere using HDF5

instead of including the demo function, place it in the same file with your mre.jl

Adding “@sync” before “@distributed” resolved the issue. I’m happy that it is running but still at a loss to understand why “@sync” was needed from the command line and not from the REPL.