@everywhere works, sort of

I am working on setting up a parallel code, and I am putting in one piece at a time to make sure I actually understand how to do this properly. Primarily, I put the following piece of code in to Julia on the front end of the HPC and also as a submitted job to the cluster:

using Distributed

@everywhere begin
using SharedArrays
using DynamicPolynomials

EVindices_int = [[0,0,1,1] [0,1,1,2]];
EVindices = [[0.,0.,1.,1.] [0.,1.,1.,2.]];

zmod  = sum(.!(EVindices_int[:,1]     .!= 0));  # Number of zero modes
dmod  = sum(   EVindices_int[:,1]     .!= 0 );  # Number of double-eigenvalue modes
dmod1 = sum(.!(EVindices_int[:,1].-1  .!= 0));  # Single wavenumber double-modes
dmod2 = sum(.!(EVindices_int[:,1].-2  .!= 0));  # Double wavenumber double-modes
dmod3 = sum(.!(EVindices_int[:,1].-3  .!= 0));  # Triple wavenumber double-modes
m     = zmod + 2*(dmod1+dmod2+dmod3);           # Size of (truncated) system

NROWS = zmod + dmod1 + dmod2 + dmod3;
@polyvar(a[1:m]);
@polyvar(q);
end

In terminal, I run export OMP_NUM_THREADS=4 then activate julia. Then I include the code in the front end terminal here, and it runs without issue.

Now, if I submit the job with the following bash script

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --time=00:10:00

module load StdEnv/2020
module load matlab/2021a.5
module load julia/1.9.1
export JULIA_NUM_THREADS=16
export OMP_NUM_THREADS=16
srun hostname -s > hostfile
julia --machine-file ./hostfile ./test_parallel.jl

to the machine, it tells me that @polyvar is not defined. To me, this means @everywhere is not working properly. This is a slightly different problem to many of the existing questions on the forums, where the error is usually a package inclusion problem (see, e.g., here).

To check my hypothesis, I ran the code where I only wrapped the package inclusions in the @everywhere begin ... end loop, and then every other command I added to the beginning @everywhere. I have additional code after this that is also included in everywhere, but it all seemed to run fine, until it reached the end and decided to seg fault (separate issue I am trying to sort out). Technically, I don’t need to run this piece of the code on all processors, it would just be easier than learning how to localize and broadcast it, since it is not a horribly slow and not going to be the main source of memory throughout, but the results will need to be accessible by every processor for the tasks later to be done in a distributed for loop. Moreover, I will need these packages to be loaded on every processor for the math being done in the distributed loop, and I am concerned that won’t happen, and the error will not be that the package is not loaded but rather that commands don’t exist. This feels like a bug, and I was wondering if anyone else had seen it or knows if it is similar to the package errors reported on profusely on this forum. Thank you.

It could be that the macro is the problem. IIRC you need to import/use modules before macros work.
Try 2 @everywhere

@everywhere begin
    using SharedArrays
    using DynamicPolynomials
end 

@everywhere begin
    # rest of code
end
2 Likes

To clarify, Julia parses then evaluates top-level expressions one by one, and a begin block is 1 expression, so putting the macro’s import and the macro call in the same block means the parser attempts the macro call before the import is evaluated.

3 Likes