I’m attempting to use Intel MKL over multiple remote workers. I am using KissCluster.jl to set up a lightweight cluster on AWS and a machinefile to configure via
addprocs(machines, enable_threaded_blas=true, topology=:master_worker)
Each remote worker is using the same type image, and I can ssh into each one individually and run linear algebra without a problem (in other words, the MKL installation is fine) However, when I try to run pmap
for a function over the workers I get the following error:
From worker 19: /home/ubuntu/julia/usr/bin/julia: symbol lookup error: /opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64/libmkl_intel_thread.so: undefined symbol: omp_get_num_procs
For all the workers. I found this issue (https://github.com/JuliaLang/julia/issues/27940) which had a similar problem, but they were able to resolve it.
I think that the issue is that my ~/.profile
is not set correctly. Basically, I run source /opt/intel/bin/compilervars.sh intel64
on each instance as it spins up, but I’m worried that this doesn’t do the right thing when I use pmap
. What should my ~/.profile
look like? I tried setting the LD_LIBRARY_PATH
manually in each instance by looking at LD_LIBRARY_PATH
after source /opt/intel/bin/compilervars.sh intel64
is called, resulting in:
PATH="$HOME/bin:$HOME/.local/bin:$PATH"
LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2018.3.222/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2018.3.222/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2018.3.222/linux/tbb/lib/intel64_lin/gcc4.7:/opt/intel/compilers_and_libraries_2018.3.222/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64_lin
What else should I try?