Running the same function with multiple inputs using pmap

I have a function which takes about 10 minutes to run. It is primarily in calling two external commands. I previously was using a bash script to run it. I need to run this function on several different inputs and would like to run them in parallel. Using map gives the expected results, but when I use pmap, it seems to be trying to parallelize the function itself, which I don’t want.

Further details:

The function definition looks like this:

@everywhere function getresult(x)  
    prefix = "si_$(x)"
    
    # string to control pw
    pw_in = """
        prefix='$prefix'
        """
    # string to control ph
    ph_in = """
          prefix='$prefix'
        """

    # run functions
    pw=open(`pw.x`,"r+")    # create pw process
    print(pw,pw_in)         # send it the control info
    close(pw.in)
    scf_out=read(pw,String) # waits until pw is done -- 3 minutes
    f_scf=open("./tmp/scfout_$prefix","w")
    write(f_scf,scfout)     # write pw.x results to file for later
    close(f_scf)

    ph=open(`ph.x`,"r+")   # create ph process
    print(ph,ph_in)        # send it the control info
        # this process reads the results from pw based on prefix
    close(ph.in)
    phGout=read(ph,String) # waits until ph is done -- 7 minutes
    f_scf=open("./tmp/phGout_$prefix","w")
    write(f_scf,phGout)    # write ph.x results to file for later
    close(f_scf)

    return(prefix)
end

The results are all written to files which are labeled by prefix which I define using the input variables. The process ph.x depends on the results that pw.x wrote, and knows where to look for them from its input string.

I try to run this function on multiple inputs using:

result = pmap(x->getresult(x), [1,2,3,4])

Julia has no problem, but the external processes fail because ph.x tries to run before pw.x has finished.

Some things I’ve tried:

  • replacing pmap with map gives the expected result.
  • running the function with different inputs on separate julia processes at the same time gives the expected results–it does not appear the external processes are interfering with each other.

Could you all help me figure out how to fix this? I am unfamiliar with parallel processing lingo which is making it hard to understand the docs. It feels like julia is trying to parallelize more than possible.

pmap will not parallelize the code im the function. On each core the function will perform as expected.

Are all scfout_ output of the files created and if so, are the contents correct? What if you create two separate functions for the pw/ph sections and call pmap using each of them and thensame input vector?

Thank you for the suggestions. I tried writing two separate functions for each of pw and ph, but that did not help.

I did fix my problem, however, by writing the control string to a file and having the process read from that rather than passing the string directly. I don’t know why that fixed it, though, or why I never saw problems using plain map.