Julia-Python interface fails in Julia library likely due to wrong SSL credentials

Hi all,

I am experiencing a very complicated bug regarding SSL certification in a Julia package integrated with Python using PyCall and/or PythonCall (we have problems with the two actually, so probably the source of error is the same). In the package Sleipnir.jl we have a python setup with some Python dependencies handled with conda and some a posteriori installs using pip (unfortunately, we need these extra pip installs). The whole library is tested in continuous integration with the complete setup, so you can have an idea of how the different installs are handled. Currently we are testing in both MacOS and Ubuntu. Here is the workflow in CI (the specific details of the Julia library does not matter, just know that some preprocessing is done in Python and some analysis in Julia):

name: Run Tests
on:
   pull_request:
    branches:
      - main
   push:
    branches: []
    tags: '*'
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ startsWith(github.ref, 'refs/pull/') }}
jobs:
  test:
    name: Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }}
    runs-on: ${{ matrix.os }}
    defaults:
       run:
         shell: bash -el {0}
    strategy:
      fail-fast: false
      matrix:
        version:
          - '1'
        python:
          - 3.12
        os:
          - ubuntu-latest 
          - macos-latest
        arch:
          - x64
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python šŸ ${{ matrix.python }}
        uses: actions/setup-python@v2
        with:
          python-version: ${{ matrix.python }}  
      - name: Create environment with micromamba šŸšŸ–¤
        uses: mamba-org/setup-micromamba@v1
        with: 
          micromamba-version: '2.0.2-2'
          environment-file: ./environment.yml
          environment-name: oggm_env                # it is recommendable to add both name and yml file. 
          init-shell: bash
          cache-environment: false
          cache-downloads: false
      - name: Update certifi
        run: | 
            pip install --upgrade certifi
        shell: bash -el {0}
      - name: Set ENV Variables for PyCall.jl šŸ šŸ“ž
        run: | 
          echo "PYTHON=/home/runner/micromamba/envs/oggm_env/bin/python" >> "$GITHUB_ENV"
        shell: bash -el {0}
      - uses: julia-actions/setup-julia@v1
        with:
          version: ${{ matrix.version }}
          arch: ${{ matrix.arch }}
      - name: Check Julia SSL certifications šŸ”ŽšŸ”
        run: |
          julia -e 'using NetworkOptions; println(NetworkOptions.bundled_ca_roots()); println(NetworkOptions.ca_roots_path()); println(NetworkOptions.ssh_key_path()); println(NetworkOptions.ssh_key_name()); println(NetworkOptions.ssh_pub_key_path())'
          echo "SSH_PATH=$(julia -e 'using NetworkOptions; println(NetworkOptions.bundled_ca_roots())')" >> "$GITHUB_ENV"
        shell: bash -el {0}
      - name: Install dependencies on Ubuntu
        if: matrix.os == 'ubuntu-latest'
        run: |
          sudo apt-get update
          sudo apt-get install -y libxml2 libxml2-dev libspatialite7 libspatialite-dev
          dpkg -L libxml2
          dpkg -L libspatialite7
          echo "LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH" >> "$GITHUB_ENV"
          # export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
      - name: Install dependencies on macOS
        if: matrix.os == 'macos-latest'
        run: |
          brew install libxml2 libspatialite
          echo "========= Checking on Installs =========="
          brew info libxml2
          echo "========= Checking on Installs =========="
          brew info libspatialite
          echo "PKG_CONFIG_PATH=/opt/homebrew/opt/libxml2/lib/pkgconfig" >> "$GITHUB_ENV"
      - name: Check that new paths had been exported
        if: matrix.os == 'macos-latest'
        run: |
          echo $PYTHON
          echo $PKG_CONFIG_PATH
      - uses: julia-actions/cache@v1
        with:
          cache-registries: "true"
          cache-compiled: "true"
      - name: Build Julia packages in Ubuntu
        uses: julia-actions/julia-buildpkg@v1
        if: matrix.os == 'ubuntu-latest'
        env:
          PYTHON : /home/runner/micromamba/envs/oggm_env/bin/python
          # The SSL certificate path can be readed from the action "Check Julia SSL certifications"
          SSL_CERT_FILE: /etc/ssl/certs/ca-certificates.crt
      - name: Build Julia packages in MacOS
        uses: julia-actions/julia-buildpkg@v1
        if: matrix.os == 'macos-latest'
        env:
          PYTHON : /home/runner/micromamba/envs/oggm_env/bin/python
      - name: Run tests in Ubuntu
        uses: julia-actions/julia-runtest@v1
        if: matrix.os == 'ubuntu-latest'
        env:
          PYTHON : /home/runner/micromamba/envs/oggm_env/bin/python
      - name: Run tests in MacOS
        uses: julia-actions/julia-runtest@v1
        if: matrix.os == 'macos-latest'
        env:
          PYTHON : /home/runner/micromamba/envs/oggm_env/bin/python
      - uses: julia-actions/julia-processcoverage@v1
      - uses: codecov/codecov-action@v3
        with:
          token: ${{secrets.CODECOV_TOKEN}}
          files: lcov.info

This workflow used to work until a few months ago, where independently even having most of the Python and Julia libraries versions predetermined, something starter failed. I suspect the problem is coming from how SSL credentials are handled by both Julia and Python, but I cannot find the solution to the problem. I am playing with the paths in the different installations, trying different ways of importing the Python libraries, but nothing seems to work in CI. What I find weird is that I can make this work on my local machine on MacOS (not sure about Ubuntu), so I guess the problem is also coming from something that CI is doing.

In the past we experienced problems with the SSL credential management, the reason why the workflow has some SSL management involved. However, this does not seem to be working now. This issue is also a continuation of the related post that @JordiBolibar wrote, an issue that we still havenā€™t fully solved.

Any idea of where the problem can be coming from and how to fix it? Here is the CI workflow, the Project.toml, the environment.yml (for PyCall), and how the setup of the Python libraries is carried out inside Sleipnir.jl.

I am happy to provide more details! To be honest, the bug is even difficult to report since I donā€™t even understand exactly what is failing, but I hope someone in the community is more aware of these topics and can give us a hand!!!

2 Likes

What makes you think itā€™s SSL? If you could narrow down the behavior youā€™re seeing (stacktrace, maybe?), it would be helpful. I looked at the last failed CI run on your repo, and I donā€™t see anything about SSL, just failed precompilation.

Nevermind, found it!

Looking at that log, I see the complaint about Julia not being able to load SSL, but it doesnā€™t actually lead to the failure, just a warning during precompilation. The error comes from the missing python import.

Addressing the SSL warning, did you already try this?

Thank you @mrufsvold for the suggestion! Definitively gave me a lead, but still not quite thereā€¦

I still believe there is a problem with the Python paths. Here is another CI run that also fails during testing because it cannot find the Python libraries, even when I have added the environment variable inside the CI

  - name: Build Julia packages in MacOS
    uses: julia-actions/julia-buildpkg@v1
    if: matrix.os == 'macos-latest'
    env:
      PYTHON : /home/runner/micromamba/envs/oggm_env/bin/python
      JULIA_SSL_CA_ROOTS_PATH: /Users/runner/hostedtoolcache/julia/1.11.1/x64/share/julia/cert.pem

Looking at the log, you can see at the top that the following environment variables are being used:

I donā€™t understand why the pythonLocation and Python_ROOT_DIR are not configured properly to be the same as the PYTHON variableā€¦ even when latest version of setup-python had fixed this.

Maybe the error is coming from here? Still a bit loss, but hopefully getting closer to a solution!

Trying to make sure I am tracking the logic here:

You set up an external Python environment. And you expect that it should be manipulating a virtual environment at the path you pass to the PYTHON environment var. Then, PyCall.jl should pick up that path and reference the version of Python with all your dependencies installed. But it seems that whatever Python is getting run inside Julia doesnā€™t have the dependencies installed.

I donā€™t know the internals of the python and mamba setup actions youā€™re using, but my first thought would be to use mamba/conda to report the path it is manipulating for the virtual environment and set that as the PYTHON var instead of hard coding it.

Another thought is that you could push a new path to sys.path in Python to get it to find your dependencies.

Yes, the setup you are describing is correct. The error seems to indicate that Julia is not able to find the Python version, even when this is specified in the PYTHON environment variable and further exported in CI as an environment variable:

echo "PYTHON=/home/runner/micromamba/envs/oggm_env/bin/python" >> "$GITHUB_ENV"

If I get it right, your suggestion is what we are currently doing. The Python environment is exported in the previous line I showed, which coincides with the one obtained after installing with micromamba.

I still donā€™t understand the role of the other environment variables pythonLocation which does not seems to coincide with the Python variable.

Whatā€™s happening on this line?

This is exporting the environment variable PYTHON so it can be access for all the other runs in the actions. This is equivalent to export PYTHON=<env>, but the reason of this syntax is that in CI each run executes a new shell. This is the workaround that I found to make that variable accessible in all the other runs (altought you can also see I am being redundant and further specifying the value of the PYTHON inside the env: option in the other actions). The variable /home/runner/micromamba/envs/oggm_env/bin/python is where the location of the python environment (which corresponds to the output of which python in CI).

As I pointed out on Slack, you currently refer to two separate environments.

  1. /Users/runner/micromamba/envs/oggm_env
  2. /home/runner/micromamba/envs/oggm_env

This confusion results in you installing Python packages in path and then trying to use the other path from Julia.

Thank you @mkitti for pointing out that! Probably at introduced typo as I was tying to debug thisā€¦

I made the change to correct for the right Python path but the problem persists.

In MacOS, when no specifying SSL paths I get this error. This is requesting me to specify a variable JULIA_SSL_CA_ROOTS_PATH. When I specify this variable to be

JULIA_SSL_CA_ROOTS_PATH: /Users/runner/hostedtoolcache/julia/1.11.1/x64/share/julia/cert.pem

the error is now

Fatal Python error: Failed to import encodings module
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

If I am right, I am getting this paths out of the output of

julia -e 'using NetworkOptions; println(NetworkOptions.bundled_ca_roots()); println(NetworkOptions.ca_roots_path()); println(NetworkOptions.ssh_key_path()); println(NetworkOptions.ssh_key_name()); println(NetworkOptions.ssh_pub_key_path())'

as here seen

Now, in Ubuntu the error is different and Julia is not able to find the Python packages, even when the PYTHON variable is properly set up. (Notice that here I am using Julia 1.9, since 1.11 since not to be compiling for other reasons).

Focusing on the macOS error, the problem is that you have too many Pythons involved. You have a Python 3.9 install from the setup-python action and then you install another Python 3
12 via micromamba.

My recommendation is to remove the setup-python action and focus on the micromamba install of Python.

This may also fix the Ubuntu error. You may be loading a mix of shared libraries when loading trying to load the two Pythons.

There are lots of threads in this conversation, so Iā€™m sorry to pull focus from the python installation issue, but I just saw this comment about Python+Julia+OpenSSL that might be of interest.

You may have better luck using PythonCall.jl and CondaPkg.jl, since they fix the OpenSSL binary version on the Julia end. Also try loading a http package in Python before you load one in Julia.

Thank you @mkitti for the follow up.

I am not sure this is the source of error, but thank you for pointing at this. I just remove the setup-python@v5 action and let micromamba to do the setup of Python and the same problem persists. Here the CI error log.

@asinghvi17 since you are also the author of the comment that @mrufsvold shared, does this make you think that the problem may disappear with using PythonCall? We where working in a different branch using PythonCall and that also didnā€™t work, so I assumed there was some other underlying problem. Do you know when the problem with the OpenSSL binary was solved? Was this recent? Thank you!

@asinghvi17 I tried this same workflow with PythonCall and I have the same result. Even worse, it seems that the problem is not solvable on my end since this required PythonCall to start using more recent versions of OpenSSL. I filled an issue in PythonCall explaining this.

I really donā€™t see a solution for this problem right now. If you have any suggestion, it will be more than welcome :slight_smile: Thanks!

You can now set JULIA_CONDAPKG_OPENSSL_VERSION=ignore if you donā€™t want CondaPkg to insert the upper bound on OpenSSL. This will allow you to use openssl=3.3 in your Conda environment. However if you get any issues from the incompatible versions, itā€™s on you!

See the README for more info.

2 Likes

Thank you @cjdoris for the response and update in the CondaPkg.jl version.

I am still not making this work, and I am not sure if this is because of incompatible versions or something different. I did include

preference add CondaPkg openssl_version=ignore

and update the version of CondaPkg=0.2.24. When I import my library, I get an import error because of a difference between Reference from and Expected in:


The first error in the stacktrace points out to ~/.julia/dev/Sleipnir/.CondaPkg/env/lib/python3.12/ssl.py, so it seems to suggest that the error may be coming from there, also since the message Symbol not found: _X509_STORE_get1_objects. Does this sounds like the unsolvable error you mentioned or there is a solution for this?

Btw, this is the same error I see when I instead reference to the conda environment I have installed separately with conda:

backend: "Current"
env: "/Users/sapienza/.conda/envs/oggm_env"
openssl_version: "ignore"

Thank you so much!

Maybe an important observation here is that I am trying both with and without manually loading the libxml and spatialite libraries, which is where the problem seems to be originating. This is done directly in the config.jl of the library:

export rioxarray, netCDF, cfg, utils, workflow, tasks, global_tasks, graphics, bedtopo, millan22, MBsandbox, salem, pd, xr

using Libdl: dlopen

function __init__()

    # Create structural folders if needed
    OGGM_path = joinpath(homedir(), "Python/OGGM_data")
    if !isdir(OGGM_path)
        mkpath(OGGM_path)
    end

    # Avoid issue with dylib files
    load_libxml()
    load_spatialite()
    
    # Load Python packages
    # Only load Python packages if not previously loaded by Sleipnir
    #println("Initializing Python libraries...")
    isassigned(rioxarray) ? nothing : rioxarray[] = pyimport("rioxarray")
    isassigned(netCDF4) ? nothing : netCDF4[] = pyimport("netCDF4")
    isassigned(cfg) ? nothing : cfg[] = pyimport("oggm.cfg")
    isassigned(utils) ? nothing : utils[] = pyimport("oggm.utils")
    isassigned(workflow) ? nothing : workflow[] = pyimport("oggm.workflow")
    isassigned(tasks) ? nothing : tasks[] = pyimport("oggm.tasks")
    isassigned(global_tasks) ? nothing : global_tasks[] = pyimport("oggm.global_tasks")
    isassigned(graphics) ? nothing : graphics[] = pyimport("oggm.graphics")
    isassigned(bedtopo) ? nothing : bedtopo[] = pyimport("oggm.shop.bedtopo")
    isassigned(millan22) ? nothing : millan22[] = pyimport("oggm.shop.millan22")
    isassigned(MBsandbox) ? nothing : MBsandbox[] = pyimport("MBsandbox.mbmod_daily_oneflowline")
    isassigned(salem) ? nothing : salem[] = pyimport("salem")
    isassigned(pd) ? nothing : pd[] = pyimport("pandas")
    isassigned(xr) ? nothing : xr[] = pyimport("xarray")
end

function load_libxml()
    # lib_dir = joinpath(root_dir, ".CondaPkg/env/lib")
    # lib_dir = "/Users/sapienza/.julia/artifacts/3fe8e47e7750d32cfb194a7927fc1d886d1fdfaa/lib"
    lib_dir = "/Users/sapienza/.conda/envs/oggm_env/lib/"

    # Find all libspatialite files in the directory
    if Sys.isapple()
        lib_files = filter(f -> startswith(f, "libxml") && (endswith(f, ".dylib") || contains(f, ".dylib.")), readdir(lib_dir))
    elseif Sys.islinux()
        lib_files = filter(f -> startswith(f, "libxml") && (endswith(f, ".so") || contains(f, ".so.")), readdir(lib_dir))
    else
        error("Unsupported operating system")
    end

    if isempty(lib_files)
        println("No libxml files found in $lib_dir")
        return
    end
    
    for lib_file in lib_files
        lib_path = joinpath(lib_dir, lib_file)
        try
            dlopen(lib_path)
            println("Opened $lib_path")
        catch e
            println("Failed to load $lib_path: $e")
        end
    end
end

function load_spatialite()
    # lib_dir = joinpath(root_dir, ".CondaPkg/env/lib")
    # lib_dir = "/Users/sapienza/.julia/artifacts/3fe8e47e7750d32cfb194a7927fc1d886d1fdfaa/lib"
    lib_dir = "/Users/sapienza/.conda/envs/oggm_env/lib/"
    
    # Find all libspatialite files in the directory
    if Sys.isapple()
        lib_files = filter(f -> startswith(f, "libspatialite") && (endswith(f, ".dylib") || contains(f, ".dylib.")), readdir(lib_dir))
    elseif Sys.islinux()
        lib_files = filter(f -> startswith(f, "libspatialite") && (endswith(f, ".so") || contains(f, ".so.")), readdir(lib_dir))
    else
        error("Unsupported operating system")
    end

    if isempty(lib_files)
        println("No libspatialite files found in $lib_dir")
        return
    end
    
    for lib_file in lib_files
        lib_path = joinpath(lib_dir, lib_file)
        try
            dlopen(lib_path)
            println("Opened $lib_path")
        catch e
            println("Failed to load $lib_path: $e")
        end
    end
end

function clean()
    atexit() do
        run(`$(Base.julia_cmd())`)
    end
    exit()
 end

 function enable_multiprocessing(procs::Int)
    if procs > 0 
        if nprocs() < procs
            @eval begin
            addprocs($procs - nprocs(); exeflags="--project")
            println("Number of cores: ", nprocs())
            println("Number of workers: ", nworkers())
            @everywhere using Sleipnir
            end # @eval
        elseif nprocs() != procs && procs == 1
            @eval begin
            rmprocs(workers(), waitfor=0)
            println("Number of cores: ", nprocs())
            println("Number of workers: ", nworkers())
            end # @eval
        end
    end
    return nworkers()
end

I tried a series of options around this but still cannot make it workā€¦ I donā€™t want to give up on this, @cjdoris @asinghvi17 @mkitti do you have any last suggestion? :innocent:

As explained in the previous message, I donā€™t know if the problem still comes from SSL or something different now.