Inspecting and controlling sckit-learn version in ScikitLearn.jl

How do determine which version of the python library scikit-learn my ScikitLearn.jl installation is using? Is there any way to control which version is used? If not, how is the version installed determined? Does it depend on existing python installation on my machine, and so forth?

I’m using using ScikitLearn in another package (MLJModels) and the tests on travis (which pass) appear to build scikit-learn-0.21.3. But in my local testing I get failures which appear to be due to my using an earlier scikit-learn and I am struggling to figure out how to update. Just doing build ScikitLearn does not appear to make a difference. I should add that the version of ScikitLearn.jl and ScikitlearnBase.jl are the same on travis and locally. They are 0.5.1 and 0.5.2 respectively.

2 Likes

So it seems that ScikitLearn.jl will try to install the python library via anaconda/Conda.jl

https://github.com/cstjean/ScikitLearn.jl/blob/78ff29e453afcf2578074a5f215d14575ae18a94/src/Skcore.jl#L119

The installation via Conda will occur only if an existing sklearn library does not exist on the machine.

https://github.com/JuliaPy/PyCall.jl/blob/6dcf5c2e1ac1399f0ad4f7d4c7cdadfd21faa524/src/PyCall.jl#L693

So if you have manually installed sklearn at any point on your machine, it is likely the old version will be loaded. One way to work around it is to export PYTHON="" before starting Julia (or set ENV["PYTHON"]="" in Julia) and then run Pkg.build("PyCall") – this should disregard any existing python installation, and force the use of a Julia specific conda environment. There should be output when you load ScikitLearn.jl if the conda install is triggered.

I can’t see any way of specifying a particular version of the sklearn library. It looks like you will get whatever is the latest in conda.

2 Likes

Thanks so much for that.

I’ve tried the workaround but this does not “seem” to work in julia 1.1.1 or 1.2.0. I say “seem” because I don’t know how to directly determine which python environment is actually being used - I just keep getting a fail for my local MLJModels test.

So, I set ENV["PYTHON"] =“” and, build PyCall, and import ScikitLearn but no new precompilation happens and there is no sign that a conda install has been triggered (as there is on travis).

Here’s the log from the PyCall build:

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

┌ Info: Using the Python distribution in the Conda package by default.
└ To use a different Python version, set ENV["PYTHON"]="pythoncommand" and re-run Pkg.build("PyCall").
[ Info: Running `conda install -y numpy` in root environment
[ Info: PyCall is using /Users/anthony/.julia/conda/3/bin/python (Python 3.7.1) at /Users/anthony/.julia/conda/3/bin/python, libpython = /Users/anthony/.julia/conda/3/lib/libpython3.7m.dylib
[ Info: /Users/anthony/.julia/packages/PyCall/ttONZ/deps/deps.jl has not changed
[ Info: /Users/anthony/.julia/prefs/PyCall has not changed

So it seems that .julia conda python is being installed after all. So why doesn’t using ScikitLearn.jl add (latest) scikitlearn to the root environment of this? (Or it doesn’t seem to).

Hi,
I don’t know if this issue is still a problem to you.
I solved it simply by replacing the pyimport_conda by pyimport.

It works because I had previously configured PyCall to point towards my usual python installation, with ScikitLearn Package up to date. It is explained here how to do it.

1 Like