I’m attempting to use Intel MKL over multiple remote workers. I am using KissCluster.jl to set up a lightweight cluster on AWS and a machinefile to configure via
addprocs(machines, enable_threaded_blas=true, topology=:master_worker)
Each remote worker is using the same type image, and I can ssh into each one individually and run linear algebra without a problem (in other words, the MKL installation is fine) However, when I try to run
pmap for a function over the workers I get the following error:
From worker 19: /home/ubuntu/julia/usr/bin/julia: symbol lookup error: /opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64/libmkl_intel_thread.so: undefined symbol: omp_get_num_procs
For all the workers. I found this issue (https://github.com/JuliaLang/julia/issues/27940) which had a similar problem, but they were able to resolve it.
I think that the issue is that my
~/.profile is not set correctly. Basically, I run
source /opt/intel/bin/compilervars.sh intel64 on each instance as it spins up, but I’m worried that this doesn’t do the right thing when I use
pmap. What should my
~/.profile look like? I tried setting the
LD_LIBRARY_PATH manually in each instance by looking at
source /opt/intel/bin/compilervars.sh intel64 is called, resulting in:
What else should I try?