Can not install PyCall and PyPlot on HPC cluster

Hi All,

I am using a high-performance computing cluster to solve my optimization problem. After I get the results, I need to plot results on bar plots, line plots, and stack graphs. I am doing these successfully on my local machine, but it gives error on the cluster in using stage.

When I have the following

using JuMP, PyPlot, DataFrames, CSV, Gurobi, Missings, PyCall, Statistics, TimerOutputs

as a first line, it gives error below:

So, it tells me that I do not have the required installation. When I have the following as first lines,

using Pkg
Pkg.add("PyCall")
using PyCall
Pkg.add("PyPlot")
using PyPlot

using JuMP, DataFrames, CSV, Gurobi, Missings, Statistics, TimerOutputs

it gives the following error:

So, there is a problem in Line 6, which is Pkg.add("PyCall").

Can you please help me with this?

Kind regards,
Fikri Kucuksayacigil

It could be that the cluster worker node doesn’t have direct internet access? You may need to download and configure the packages on the login node beforehand.

This could also have been an error on the package server side: I just saw a similar error in one of my CI jobs.

Thank you for your message. But, I think the problem is not internet connection because I was able to run the same code on the cluster without PyCall and PyPlot before. Recently, I started feeling that I need figures. That is why I integrated PyCall and PyPlot. Then, this error happens.

What do you suggest me? Should I try to install a simple package to see if the problem is about internet connection?

What @simonbyrne is saying is that the compute nodes on the cluster may not have internet access.
Try logging into the login node and running Julia in the REPL then adding the packages you need.

There was also an issue with the package server today, so that could have been the issue. Try it again?

I think you have installed PyCall correctly. It is only that it is trying to call the python installation that is by default in the cluster, and that one doesn’t have matplotlib installed.

The easiest would be to read the third option given in your first error message (the paragraph starting with Another alternative…). For this you will probably need internet access, which, as Simon says may not be available in the compute nodes.

So follow those instructions in the login node to get your environment correctly configured.

I used to run into this error all the time on my cluster. So what I did was

ENV["PYTHON"] = (path of Python executable for Julia's miniconda)
pkg> build PyCall

This would force the python that I used to be the one from Julia.

Sorry for this dumb question, but how do I start Julia session on cluster terminal? When I type julia, it says command not found. When I type python, it starts a Python session.

You should probably have a conversion with your HPC admin. You appear to trying to use the system python, which is probably not the latest python.

On some HPCs you need to do something like module load python julia in order to load python and julia into your shell environment.

Since you can start Python, what happens when you type “import matplotlib” there?

It says No module named matplotlib

I am currently trying to start a Julia session in terminal and will do Pkg.add("PyCall") and Pkg.add("PyPlot") in the terminal. I am trying to make Julia callable through terminal. It seems that I have to add an environment variable pointing to julia/bin, but I have not got yet where julia was installed (what is the path?).

This means that your Python environment does not have matplitlib installed. This is a prerequisite.

This is specific to your HPC. You need to talk to whomever installed Julia on your HPC system, unless you installed Julia yourself.

Can you tell us which HPC you are using? If it is public, I might be able to help you find public information about it.

It seems that I progressed a little.

First of all, my terminal windows (in my local machine and cluster) were not able to start Julia. I figured out how to make Julia initiated through terminal windows on Mac and Linux. I now can use Julia in terminal at the cluster.

When I typed Pkg.status(), I see

So, it seems that packages are already installed. When I type using PyCall, there is no error. When I type using PyPlot, it gives the following error:

I will follow the suggestions given in this error.

By the way @mkitti, I am using University of California San Diego’s HPC for my work. I am pretty new to cluster computing. So, it takes some time for me to figure out things.

I think I did it. I just typed

ENV["PYTHON"]=""
Pkg.build("PyCall")

and then restarted Julia, as described in the error message. I am now able to do using PyPlot without any error message.

1 Like

Essentially what this did is create a distinct Python install using miniconda for Julia. Before it was trying to use the operating system’s Python install. This works, but I am not sure if it is the best use of HPC resources.

What I suspect is there is probably is an existing Python “module” somewhere on the system for you so you don’t have to have a full copy in your home directory. On the login node, if you type “module avail” do you see a list of software that includes Python?

I have not found any UCSD specific instructions. Most HPCs I have used work like this:
https://www.glue.umd.edu/hpcc/help/software/python.html

Yes, I see python/2.7.15(default) when I type module avail.

Are you saying that it would be enough to write module load julia python in the job script (.sh file) to get everything correctly working? Or, would I still have to replicate what I did above (open Julia through terminal, and write ENV["PYTHON"]="" and Pkg.build("PyCall")?

Python 2.7.15? Sure there is a more recent version available.