How to launch several run cmd in parallel

Hello,

My question is probably easy to solve, but I got confused by the several packages and proposals (@sync, @async, Distributed etc…). In the most simple case there is a script in Julia. This script has a loop that run bash script that run a compiled code (several times in a serial loop).

for param in [a1,a2,a3,......,a100]  
     basedir =  Base.Filesystem.pwd()
     Base.Filesystem.cd(param)
     run(`bash loop.sh`)  # this launch in serial several programs.
     Base.Filesystem.cd(basedir)
end

I would like to launch these bash scripts in parallel in such way that I am using all time k cores but not more. So it will launch initially k processes, and when one of the cores is free it run another of these bash loop.sh until the for loop in param is completed.

Thank you very much

Maybe you can use a Semaphore of size equal to the number of cores, and then call

sem = Base.Semaphore(8) # if you have 8 cores
for loop
    @async begin
        Base.acquire(sem)
        run(`bash loop.sh`)
        Base.release(sem)
    end
end

You’d have to be careful with the Base.Filesystem.cd(param) stuff though, I guess that’s a global state?

3 Likes

If you are on Linux you could also run from Julia GNU Parallel that does exactly that: it takes a text file with commands (I use a single command per line, I don’t know if you can use it in other ways) and run the first k-cores commands and then as soon as one ends it continues with the remaining commands…

For example in my model I have a “runscenarios.sh” file that contains:

#!/bin/bash

./run_single_scenario.sh 'scenarioName1'
./run_single_scenario.sh 'scenarioName2'
./run_single_scenario.sh 'inputFile1' 'scenarioName3'
./run_single_scenario.sh 'inputFile2' 'scenarioName4'
...

and then I run it with: parallel --jobs <n of jobs> -a runscenarios.sh

4 Likes

If you want to limit the number of processes run concurrently to k, you can simply use k tasks.

function parallel_run(commands; ntasks = Sys.CPU_THREADS)
    request = Channel{Cmd}() do request
        for cmd in commands
            put!(request, cmd)
        end
    end
    @sync for _ in 1:ntasks
        @async try
            foreach(run, request)
        finally
            close(request)  # shutdown on error
        end
    end
end

For more techniques like this, see: Concurrency patterns for controlled parallelisms

Also, as baggepinnen mentioned, don’t use cd since it mutates the global state. You can use setdnev(cmd; dir = ...) to set the directory for each command:

parallel_run([
    `pwd`,
    setenv(`pwd`; dir = ".."),
    setenv(`pwd`; dir = "/tmp"),
])
3 Likes

Thank you @tkf @sylvaticus @baggepinnen

All of these comments are solutions! I was not aware about the problem with the global state, this explain why my initial tentative was not working. At this moment @sylvaticus 's solution should work for my setting directly. And, as soon as, I translate some stuff to Julia, then the other solutions are going to be great. Thank you also for the link.

1 Like