Intel MKL with Distributed fails

platawiec · May 9, 2019, 11:41pm

I’m attempting to use Intel MKL over multiple remote workers. I am using KissCluster.jl to set up a lightweight cluster on AWS and a machinefile to configure via

addprocs(machines, enable_threaded_blas=true, topology=:master_worker)

Each remote worker is using the same type image, and I can ssh into each one individually and run linear algebra without a problem (in other words, the MKL installation is fine) However, when I try to run pmap for a function over the workers I get the following error:

From worker 19:	/home/ubuntu/julia/usr/bin/julia: symbol lookup error: /opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64/libmkl_intel_thread.so: undefined symbol: omp_get_num_procs

For all the workers. I found this issue (https://github.com/JuliaLang/julia/issues/27940) which had a similar problem, but they were able to resolve it.

I think that the issue is that my ~/.profile is not set correctly. Basically, I run source /opt/intel/bin/compilervars.sh intel64 on each instance as it spins up, but I’m worried that this doesn’t do the right thing when I use pmap. What should my ~/.profile look like? I tried setting the LD_LIBRARY_PATH manually in each instance by looking at LD_LIBRARY_PATH after source /opt/intel/bin/compilervars.sh intel64 is called, resulting in:

PATH="$HOME/bin:$HOME/.local/bin:$PATH"
LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2018.3.222/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2018.3.222/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2018.3.222/linux/tbb/lib/intel64_lin/gcc4.7:/opt/intel/compilers_and_libraries_2018.3.222/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64_lin

What else should I try?

platawiec · May 10, 2019, 4:42pm

Some more color:

I have tried to add the following to ~/.profile:

# bash
source /opt/intel/bin/compilervars.sh intel64

That works when I ssh into the instances, but for some reason when I tried to add those instances via addprocs, I get a source not found error, suggesting that line in ~/.profile is not being run as bash (?), and mkl does not load.

I also attempted to add the source command to the second line of the script from cat cloud_init_node_myc.sh in KissCluster to my launch script for the instances, according to https://github.com/pszufe/KissCluster, but then the nodes don’t connect to the cluster properly.

platawiec · May 12, 2019, 7:29pm

This turned out to be pretty simple. I simply had to add

source /opt/intel/bin/compilervars.sh intel64

to the top of my .bashrc file. Works like a charm now

Topic		Replies	Views
Correct way of using MKL.jl on Julia Version 1.7.0 Performance mkl	4	2937	September 8, 2021
Julia libdl_find_library in cluster Julia at Scale multithreading , cluster	15	733	November 7, 2020
Question About MKL on Slurm Cluster Specific Domains hpc , mkl	1	263	October 29, 2023
Issues building Julia 0.7 with Intel MKL on macOS Internals & Design build	13	2746	June 13, 2018
Juno wants to load libmkl_intel_thread.so Juno	2	639	February 11, 2019

Intel MKL with Distributed fails

Related topics