Configuring MPI.jl to run on XC-40 / KNL with craype and cray-mpich

I am attempting to configure the MPI.jl package to run in Julia 1.0.3 on theta, a Cray XC-40/KNL. Cross-compilation has not been an issue, but interfacing to the CrayPE and Cray-MPICH are currently stumping me. I have been able to install Julia 1.0.3 and run non-MPI codes both interactively and non-interactively. I have also been able to add BinDep. In attempting to :

pkg> add MPI

I have attempted to set up the environment according to the instructions at:

https://github.com/JuliaParallel/MPI.jl#overriding-the-auto-detected-mpi-version

And invoke the repl with

CC=(which gcc) CXX=(which g++) FC=$(which gfortran) julia

as described there. I ran the craype wrappers using the -craype-verbose flag on an MPI ‘hello world’ in order to deconstruct how gcc and gfortran are to be invoked and to link MPI within that environment. I used this information to extract and provide values for JULIA_MPI_C_LIBRARIES, JULIA_MPI_C_INCLUDE_PATH, JULIA_MPI_Fortran_INCLUDE_PATH, and JULIA_MPI_Fortran_LIBRARIES as described in the documentation at:

https://github.com/JuliaParallel/MPI.jl#overriding-the-auto-detected-mpi-version

The output from these was:

cc -shared -craype-verbose mpi_hello.c 

gcc -march=knl -shared -fPIC -D__CRAYXC -D__CRAY_MIC_KNL -D__CRAYXT_COMPUTE_LINUX_TARGET -D__TARGET_LINUX__ mpi_hello.c -I/opt/cray/pe/libsci/18.07.1/GNU/7.1/x86_64/include -I/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/include -I/opt/cray/rca/2.2.18-6.0.6.0_19.14__g2aa4f39.ari/include -I/opt/cray/alps/6.6.1-6.0.6.1_4.1__ga6396bb.ari/include -I/opt/cray/xpmem/2.2.14-6.0.6.0_10.1__g34333c9.ari/include -I/opt/cray/gni-headers/5.0.12-6.0.6.0_3.26__g527b6e1.ari/include -I/opt/cray/pe/pmi/5.0.14/include -I/opt/cray/ugni/6.0.14-6.0.6.0_18.12__g777707d.ari/include -I/opt/cray/udreg/2.3.2-6.0.6.0_15.18__g5196236.ari/include -I/opt/cray/wlm_detect/1.3.2-6.0.6.0_3.8__g388ccd5.ari/include -I/opt/cray/krca/2.2.4-6.0.6.0_8.14__g8505b97.ari/include -I/opt/cray-hss-devel/8.0.0/include -L/opt/cray/pe/libsci/18.07.1/GNU/7.1/x86_64/lib -L/opt/cray/dmapp/default/lib64 -L/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/lib -L/opt/cray/rca/2.2.18-6.0.6.0_19.14__g2aa4f39.ari/lib64 -L/lib64 -lrca -lz -Wl,--as-needed,-lsci_gnu_71_mpi,--no-as-needed -Wl,--as-needed,-lsci_gnu_71,--no-as-needed -Wl,--as-needed,-lmpich_gnu_71,--no-as-needed -Wl,--as-needed,-lgfortran,-lquadmath,--no-as-needed -Wl,--as-needed,-lmvec,--no-as-needed -Wl,--as-needed,-lm,--no-as-needed -Wl,--as-needed,-lpthread,--no-as-needed

And:

ftn -shared -craype-verbose mpi_hello.f90 

gfortran -march=knl -shared -fPIC -D__CRAYXC -D__CRAY_MIC_KNL -D__CRAYXT_COMPUTE_LINUX_TARGET -D__TARGET_LINUX__ mpi_hello.f90 -I/opt/cray/pe/libsci/18.07.1/GNU/7.1/x86_64/include -I/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/include -I/opt/cray/rca/2.2.18-6.0.6.0_19.14__g2aa4f39.ari/include -I/opt/cray/alps/6.6.1-6.0.6.1_4.1__ga6396bb.ari/include -I/opt/cray/xpmem/2.2.14-6.0.6.0_10.1__g34333c9.ari/include -I/opt/cray/gni-headers/5.0.12-6.0.6.0_3.26__g527b6e1.ari/include -I/opt/cray/pe/pmi/5.0.14/include -I/opt/cray/ugni/6.0.14-6.0.6.0_18.12__g777707d.ari/include -I/opt/cray/udreg/2.3.2-6.0.6.0_15.18__g5196236.ari/include -I/opt/cray/wlm_detect/1.3.2-6.0.6.0_3.8__g388ccd5.ari/include -I/opt/cray/krca/2.2.4-6.0.6.0_8.14__g8505b97.ari/include -I/opt/cray-hss-devel/8.0.0/include -L/opt/cray/pe/libsci/18.07.1/GNU/7.1/x86_64/lib -L/opt/cray/dmapp/default/lib64 -L/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/lib -L/opt/cray/dmapp/default/lib64 -L/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/lib -L/opt/cray/rca/2.2.18-6.0.6.0_19.14__g2aa4f39.ari/lib64 -L/lib64 -lrca -lz -Wl,--as-needed,-lsci_gnu_71_mpi,--no-as-needed -Wl,--as-needed,-lsci_gnu_71,--no-as-needed -Wl,--as-needed,-lmpich_gnu_71,--no-as-needed -Wl,--as-needed,-lmpichf90_gnu_71,--no-as-needed -Wl,--as-needed,-lgfortran,-lquadmath,--no-as-needed -Wl,--as-needed,-lpthread,--no-as-needed

From these, I extracted and provided values to be picked up by CMake:

export JULIA_MPI_C_LIBRARIES="-I/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/include/ -L/opt/cray/pe/lib64 lmpich_gnu_71"

export JULIA_MPI_C_INCLUDE_PATH="-I/opt/cray/pe/libsci/18.07.1/GNU/7.1/x86_64/include -I/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/include -I/opt/cray/rca/2.2.18-6.0.6.0_19.14__g2aa4f39.ari/include -I/opt/cray/alps/6.6.1-6.0.6.1_4.1__ga6396bb.ari/include -I/opt/cray/xpmem/2.2.14-6.0.6.0_10.1__g34333c9.ari/include -I/opt/cray/gni-headers/5.0.12-6.0.6.0_3.26__g527b6e1.ari/include -I/opt/cray/pe/pmi/5.0.14/include -I/opt/cray/ugni/6.0.14-6.0.6.0_18.12__g777707d.ari/include -I/opt/cray/udreg/2.3.2-6.0.6.0_15.18__g5196236.ari/include -I/opt/cray/wlm_detect/1.3.2-6.0.6.0_3.8__g388ccd5.ari/include -I/opt/cray/krca/2.2.4-6.0.6.0_8.14__g8505b97.ari/include -I/opt/cray-hss-devel/8.0.0/include"

export JULIA_MPI_Fortran_INCLUDE_PATH="-I/opt/cray/pe/libsci/18.07.1/GNU/7.1/x86_64/include -I/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/include -I/opt/cray/rca/2.2.18-6.0.6.0_19.14__

g2aa4f39.ari/include -I/opt/cray/alps/6.6.1-6.0.6.1_4.1__ga6396bb.ari/include -I/opt/cray/xpmem/2.2.14-6.0.6.0_10.1__g34333c9.ari/include -I/opt/cray/gni-headers/5.0.12-6.0.6.0_3.26

__g527b6e1.ari/include -I/opt/cray/pe/pmi/5.0.14/include -I/opt/cray/ugni/6.0.14-6.0.6.0_18.12__g777707d.ari/include -I/opt/cray/udreg/2.3.2-6.0.6.0_15.18__g5196236.ari/include -I/o

pt/cray/wlm_detect/1.3.2-6.0.6.0_3.8__g388ccd5.ari/include -I/opt/cray/krca/2.2.4-6.0.6.0_8.14__g8505b97.ari/include -I/opt/cray-hss-devel/8.0.0/include"

export JULIA_MPI_Fortran_LIBRARIES="-L/opt/cray/pe/libsci/18.07.1/GNU/7.1/x86_64/lib -L/opt/cray/dmapp/default/lib64 -L/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/lib -L/opt/cray/dmapp/default/lib64 -L/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/lib -L/opt/cray/rca/2.2.18-6.0.6.0_19.14__g2aa4f39.ari/lib64 -L/lib64 -lrca -lz -Wl,--as-needed,-lsci_gnu_71_mpi,--no-as-needed -Wl,--as-needed,-lsci_gnu_71,--no-as-needed -Wl,--as-needed,-lmpich_gnu_71,--no-as-needed -Wl,--as-needed,-lmpichf90_gnu_71,--no-as-needed -Wl,--as-needed,-lgfortran,-lquadmath,--no-as-needed -Wl,--as-needed,-lpthread,--no-as-needed"

When attempting to pkg> add MPI, the build.log file in my depot reflects the following:

-- The Fortran compiler identification is GNU 7.3.0

-- The C compiler identification is GNU 7.3.0

-- Check for working Fortran compiler: /opt/gcc/7.3.0/bin/gfortran

-- Check for working Fortran compiler: /opt/gcc/7.3.0/bin/gfortran -- works

-- Detecting Fortran compiler ABI info

-- Detecting Fortran compiler ABI info - done

-- Checking whether /opt/gcc/7.3.0/bin/gfortran supports Fortran 90

-- Checking whether /opt/gcc/7.3.0/bin/gfortran supports Fortran 90 -- yes

-- Check for working C compiler: /opt/gcc/7.3.0/bin/gcc

-- Check for working C compiler: /opt/gcc/7.3.0/bin/gcc -- works

-- Detecting C compiler ABI info

-- Detecting C compiler ABI info - done

-- Detecting C compile features

-- Detecting C compile features - done

-- Found Git: /usr/bin/git (found version "2.12.3") 

-- Found MPI_C: -I/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/include/ -L/opt/cray/pe/lib64 -lmpich_gnu_71  

-- Found MPI_Fortran: -L/opt/cray/pe/libsci/18.07.1/GNU/7.1/x86_64/lib -L/opt/cray/dmapp/default/lib64 -L/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/lib -L/opt/cray/dmapp/default/lib64

 -L/opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/lib -L/opt/cray/rca/2.2.18-6.0.6.0_19.14__g2aa4f39.ari/lib64 -L/lib64 -lrca -lz -Wl,--as-needed,-lsci_gnu_71_mpi,--no-as-needed -Wl,--as-

needed,-lsci_gnu_71,--no-as-needed -Wl,--as-needed,-lmpich_gnu_71,--no-as-needed -Wl,--as-needed,-lmpichf90_gnu_71,--no-as-needed -Wl,--as-needed,-lgfortran,-lquadmath,--no-as-neede

d -Wl,--as-needed,-lpthread,--no-as-needed  

-- Detecting Fortran/C Interface

-- Detecting Fortran/C Interface - Found GLOBAL and MODULE mangling

-- Looking for MPI_Comm_c2f

-- Looking for MPI_Comm_c2f - not found

-- Configuring done

-- Generating done

-- Build files have been written to: /gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/build

Scanning dependencies of target gen_constants

Scanning dependencies of target gen_constants

[ 11%] Building Fortran object CMakeFiles/gen_constants.dir/gen_constants.f90.o

f951: Warning: Nonexistent include directory '/gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/-I/opt/cray/pe/libsci/18.07.1/GNU/7.1/x86_64/include -I/opt/cray/pe/mpt/7.7.3/g

ni/mpich-gnu/7.1/include -I/opt/cray/rca/2.2.18-6.0.6.0_19.14__g2aa4f39.ari/include -I/opt/cray/alps/6.6.1-6.0.6.1_4.1__ga6396bb.ari/include -I/opt/cray/xpmem/2.2.14-6.0.6.0_10.1__g

34333c9.ari/include -I/opt/cray/gni-headers/5.0.12-6.0.6.0_3.26__g527b6e1.ari/include -I/opt/cray/pe/pmi/5.0.14/include -I/opt/cray/ugni/6.0.14-6.0.6.0_18.12__g777707d.ari/include -I/opt/cray/udreg/2.3.2-6.0.6.0_15.18__g5196236.ari/include -I/opt/cray/wlm_detect/1.3.2-6.0.6.0_3.8__g388ccd5.ari/include -I/opt/cray/krca/2.2.4-6.0.6.0_8.14__g8505b97.ari/include -I/opt/cray-hss-devel/8.0.0/include' [-Wmissing-include-dirs]

/gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/gen_constants.f90:3: Error: Can't open included file 'mpif.h'

CMakeFiles/gen_constants.dir/build.make:62: recipe for target 'CMakeFiles/gen_constants.dir/gen_constants.f90.o' failed

make[2]: *** [CMakeFiles/gen_constants.dir/gen_constants.f90.o] Error 1

CMakeFiles/Makefile2:241: recipe for target 'CMakeFiles/gen_constants.dir/all' failed

make[1]: *** [CMakeFiles/gen_constants.dir/all] Error 2

Makefile:149: recipe for target 'all' failed

make: *** [all] Error 2

[ Info: Attempting to create directory /gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/build

[ Info: Changing directory to /gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/build

ERROR: LoadError: failed process: Process(`make`, ProcessExited(2)) [2]

Stacktrace:

 [1] error(::String, ::Base.Process, ::String, ::Int64, ::String) at ./error.jl:42

 [2] pipeline_error at ./process.jl:705 [inlined]

 [3] #run#503(::Bool, ::Function, ::Cmd) at ./process.jl:663

 [4] run(::Cmd) at ./process.jl:661

 [5] run(::BinDeps.SynchronousStepCollection) at /home/willmore/.julia/packages/BinDeps/ZEval/src/BinDeps.jl:521 (repeats 3 times)

 [6] satisfy!(::BinDeps.LibraryDependency, ::Array{DataType,1}) at /home/willmore/.julia/packages/BinDeps/ZEval/src/dependencies.jl:944

 [7] satisfy!(::BinDeps.LibraryDependency) at /home/willmore/.julia/packages/BinDeps/ZEval/src/dependencies.jl:922

 [8] top-level scope at /home/willmore/.julia/packages/BinDeps/ZEval/src/dependencies.jl:977

 [9] include at ./boot.jl:317 [inlined]

 [10] include_relative(::Module, ::String) at ./loading.jl:1044

 [11] include(::Module, ::String) at ./sysimg.jl:29

 [12] include(::String) at ./client.jl:392

 [13] top-level scope at none:0

in expression starting at /home/willmore/.julia/packages/MPI/U5ujD/deps/build.jl:54

What is troubling here in particular is the line:
f951: Warning: Nonexistent include directory '/gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/-I/opt/cray/pe/libsci/18.07.1/GNU/7.1/x86_64/include -I/opt/cray/pe/mpt/7.7.3/g

Which seems to indicate that the first -I include directory entry is somehow being appended verbatim to the end of the deps directory, which seems inconsistent with how this information should be handled, at least as described in the MPI.jl directions.

So my questions are these:

  • Should I have.a different understanding of how this information is to be interpreted?
  • Has something changed in MPI.jl since this was documented?
  • Is CMake misinterpreting the configuration information?
  • Is there something else fundamentally missing?

We’re very interested in making MPI.jl available to run on theta, but are currently stuck at this point. I know this post is verbose. Thank you for reading.

This is a wild guess, but you could try adding a space at the beginning of that export string:

export JULIA_MPI_Fortran_INCLUDE_PATH=" -I/opt/cray/pe/libsci/...

I noticed that there is a space missing here:

Hi favba–

Thanks for looking at it. There is no space before the ‘-I’ in the published MPI.jl instructions. When I do omit that space, the compiler still not find mpi.h:

 27 /gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/gen_functions.c:4:10: fatal error: mpi.h: No such file or directory
 28  #include "mpi.h"
 29           ^~~~~~~

even though the correct mpi.h is actually available in the second -I location, i.e. /opt/cray/pe/mpt/7.7.3/gni/mpich-gnu/7.1/include

So perhaps to refine my question a bit: How do I correctly pass this include location?

@frankwillmore Please keep us updated on progress. I bet Julia goes like gangbusters on Knights Landing.

Looking at this page, it looks like the correct compilers to on Cray are cc for C, ftn for Fortran, and CC for C++. Can you confirm if these exist on your system? If so, the first thing I would try is

CC=`which cc` FC=`which ftn` CXX=`which CC` julia

and then try to build MPI.jl (without setting any of the JULIA_MPI_* variables).

Something weird is happening with the JULIA_MPI_* variables, but the most direct path to getting MPI.jl built may be to avoid needing them.

Out of curiosity, what version of CMake are you using?

Hi John-

Appreciate your optimism. I will post updates.

1 Like

Hi Jared–

Cmake version is 3.5.2. Cray wraps compilers as cc, CC and ftn. The actual command and args they execute can be had from specifying option -craype-verbose.

Running with only the command-line compiler specifications and setting no JULIA_MPI_* vars causes CMake not to find MPI_C info.

build.log:

-- The Fortran compiler identification is GNU 7.3.0
-- The C compiler identification is GNU 7.3.0
-- Check for working Fortran compiler: /opt/gcc/7.3.0/bin/gfortran
-- Check for working Fortran compiler: /opt/gcc/7.3.0/bin/gfortran  -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /opt/gcc/7.3.0/bin/gfortran supports Fortran 90
-- Checking whether /opt/gcc/7.3.0/bin/gfortran supports Fortran 90 -- yes
-- Check for working C compiler: /opt/gcc/7.3.0/bin/gcc
-- Check for working C compiler: /opt/gcc/7.3.0/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Found Git: /usr/bin/git (found version "2.12.3") 
CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
  Could NOT find MPI_C (missing: MPI_C_LIBRARIES MPI_C_INCLUDE_PATH)
Call Stack (most recent call first):
  /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake/Modules/FindMPI.cmake:614 (find_package_handle_standard_args)
  CMakeLists.txt:5 (find_package)


-- Configuring incomplete, errors occurred!
See also "/gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/build/CMakeFiles/CMakeOutput.log".
[ Info: Attempting to create directory /gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/build
[ Info: Changing directory to /gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/build
ERROR: LoadError: failed process: Process(`cmake -DCMAKE_INSTALL_PREFIX=/gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/src -DCMAKE_LIB_INSTALL_PREFIX=/gpfs/mira-home/willmore/.julia/packages/MPI/U5ujD/deps/usr/lib ..`, ProcessExited(1)) [1]
Stacktrace:
 [1] error(::String, ::Base.Process, ::String, ::Int64, ::String) at ./error.jl:42
 [2] pipeline_error at ./process.jl:705 [inlined]
 [3] #run#503(::Bool, ::Function, ::Cmd) at ./process.jl:663
 [4] run(::Cmd) at ./process.jl:661
 [5] run(::BinDeps.SynchronousStepCollection) at /home/willmore/.julia/packages/BinDeps/ZEval/src/BinDeps.jl:521 (repeats 3 times)
 [6] satisfy!(::BinDeps.LibraryDependency, ::Array{DataType,1}) at /home/willmore/.julia/packages/BinDeps/ZEval/src/dependencies.jl:944
 [7] satisfy!(::BinDeps.LibraryDependency) at /home/willmore/.julia/packages/BinDeps/ZEval/src/dependencies.jl:922
 [8] top-level scope at /home/willmore/.julia/packages/BinDeps/ZEval/src/dependencies.jl:977
 [9] include at ./boot.jl:317 [inlined]
 [10] include_relative(::Module, ::String) at ./loading.jl:1044
 [11] include(::Module, ::String) at ./sysimg.jl:29
 [12] include(::String) at ./client.jl:392
 [13] top-level scope at none:0
in expression starting at /home/willmore/.julia/packages/MPI/U5ujD/deps/build.jl:54

Interesting, thanks for posting the output. I think the next step is to try a version of CMake 3.10 or newer. There is some discussion on this Deal.II thread about how FindMPI changed in CMake 3.10 to use the compiler wrappers as the first source for finding MPI implementations.

The backstory is that MPI.jl adds the MPI_* variables to CMake on the command line by adding -D switches, so if the paths are getting set incorrectly the problem is likely internal to CMake. The code that does this is here if you are curious.

Thanks for the tip on the change in CMake. Indeed, I tried with CMake 3.11.4 and now it’s not finding MPI_C or MPI_Fortran but missing a different set of variables from last time. These variables also seem not to match those in build.jl:

 -- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS) 
│ -- Could NOT find MPI_Fortran (missing: MPI_Fortran_LIB_NAMES MPI_Fortran_F77_HEADER_DIR MPI_Fortran_MODULE_DIR MPI_Fortran_WORKS) 
│ CMake Error at /lus/theta-fs0/software/buildtools/cmake/3.11.4/share/cmake-3.11/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
│   Could NOT find MPI (missing: MPI_C_FOUND MPI_Fortran_FOUND)
│ Call Stack (most recent call first):
│   /lus/theta-fs0/software/buildtools/cmake/3.11.4/share/cmake-3.11/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
│   /lus/theta-fs0/software/buildtools/cmake/3.11.4/share/cmake-3.11/Modules/FindMPI.cmake:1663 (find_package_handle_standard_args)
│   CMakeLists.txt:5 (find_package)

I’m also trying to get some clarity from Cray about what values I should be feeding these variables, once I figure out which variables I should use.

So please, keep it coming. This is helping me find direction as I dig deeper.

Just to update this. We have 22/26 tests passing.

For tests in Julia MPI not passing:
So briefly the 4 remaining tests with errors:

  1. test_cman_julia.jl - Fixed the issue it was not run with aprun do to the test
    issues related to the if statement setting up the ‘mgr’ (the aprun)

  2. test_cman_tcp.jl - Socket complaints, but the different node output was interleaved
    and stdout/stderr.

    • I suggested rerunning with the -T option on the aprun (sync stdout and stderr)
    • But sockets may be a general concern.
  3. test_spawn.jl - MPI_Comm_spawn tests worked, except one that does have a
    extra parameter. MPI_Comm_spawn_multiple has the first parameter as ‘count’.

    • This looks like a Bug in the Julia_MPI test
  4. test_test.jl` - fails the MPI_Finalize

    • There is a bug in finalize we are looking at and it may be related.
2 Likes

Can you please open these as issues/pull requests on MPI.jl?