Intel MKL with Distributed fails

I’m attempting to use Intel MKL over multiple remote workers. I am using KissCluster.jl to set up a lightweight cluster on AWS and a machinefile to configure via

addprocs(machines, enable_threaded_blas=true, topology=:master_worker)

Each remote worker is using the same type image, and I can ssh into each one individually and run linear algebra without a problem (in other words, the MKL installation is fine) However, when I try to run pmap for a function over the workers I get the following error:

From worker 19:	/home/ubuntu/julia/usr/bin/julia: symbol lookup error: /opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64/ undefined symbol: omp_get_num_procs

For all the workers. I found this issue ( which had a similar problem, but they were able to resolve it.

I think that the issue is that my ~/.profile is not set correctly. Basically, I run source /opt/intel/bin/ intel64 on each instance as it spins up, but I’m worried that this doesn’t do the right thing when I use pmap. What should my ~/.profile look like? I tried setting the LD_LIBRARY_PATH manually in each instance by looking at LD_LIBRARY_PATH after source /opt/intel/bin/ intel64 is called, resulting in:


What else should I try?

1 Like

Some more color:

I have tried to add the following to ~/.profile:

# bash
source /opt/intel/bin/ intel64

That works when I ssh into the instances, but for some reason when I tried to add those instances via addprocs, I get a source not found error, suggesting that line in ~/.profile is not being run as bash (?), and mkl does not load.

I also attempted to add the source command to the second line of the script from cat in KissCluster to my launch script for the instances, according to, but then the nodes don’t connect to the cluster properly.

This turned out to be pretty simple. I simply had to add

source /opt/intel/bin/ intel64

to the top of my .bashrc file. Works like a charm now :slight_smile: