I can't use intake package with PyCall

Hello! I have the following code in Python that is working, but I can’t reproduce it with PyCall.

# Python code
import intake
catalog_url = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'
cat = intake.open_catalog(catalog_url)

Out[5]: main:
  args:
    path: https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml
  description: Master Data Catalog
  driver: intake.catalog.local.YAMLFileCatalog
  metadata: {}

In Julia, with PyCall (PyCall is using the same Python environment) :

using PyCall
intake = pyimport("intake")
catalog_url = "https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml"
cat = intake.open_catalog(catalog_url)

ERROR: PyError ($(Expr(:escape, :(ccall(#= /gpfs/home/dl2594/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'FileNotFoundError'>
FileNotFoundError('https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml')
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/intake/__init__.py", line 168, in open_catalog
    return registry[driver](uri, **kwargs)
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/intake/catalog/local.py", line 579, in __init__
    super(YAMLFileCatalog, self).__init__(**kwargs)
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/intake/catalog/base.py", line 110, in __init__
    self.force_reload()
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/intake/catalog/base.py", line 168, in force_reload
    self._load()
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/intake/catalog/local.py", line 609, in _load
    with file_open as f:
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/core.py", line 103, in __enter__
    f = self.fs.open(self.path, mode=mode)
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/spec.py", line 1106, in open
    f = self._open(
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/implementations/http.py", line 346, in _open
    size = size or self.info(path, **kwargs)["size"]
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/asyn.py", line 113, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/asyn.py", line 98, in sync
    raise return_result
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
    result[0] = await coro
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/implementations/http.py", line 420, in _info
    raise FileNotFoundError(url) from exc

Stacktrace:
  [1] pyerr_check
    @ ~/.julia/packages/PyCall/ygXW2/src/exception.jl:62 [inlined]
  [2] pyerr_check
    @ ~/.julia/packages/PyCall/ygXW2/src/exception.jl:66 [inlined]
  [3] _handle_error(msg::String)
    @ PyCall ~/.julia/packages/PyCall/ygXW2/src/exception.jl:83
  [4] macro expansion
    @ ~/.julia/packages/PyCall/ygXW2/src/exception.jl:97 [inlined]
  [5] #107
    @ ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 [inlined]
  [6] disable_sigint
    @ ./c.jl:458 [inlined]
  [7] __pycall!
    @ ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:42 [inlined]
  [8] _pycall!(ret::PyObject, o::PyObject, args::Tuple{String}, nargs::Int64, kw::Ptr{Nothing})
    @ PyCall ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:29
  [9] _pycall!
    @ ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:11 [inlined]
 [10] #_#114
    @ ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:86 [inlined]
 [11] (::PyObject)(args::String)
    @ PyCall ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:86
 [12] top-level scope
    @ REPL[8]:2

I also tried the following inside Julia, without success :slight_smile: (same error)

py"""
import intake
catalog_url = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'
cat = intake.open_catalog(catalog_url)
"""

Any information or solution would be welcome!

If you’re sure PyCall is using the same Python and intake version, the only thing I can think of would be some kind of firewall or proxy thing … does your system not give julia access to the internet for some reason, or are there some proxy environment variables you’ve set up for python but not for julia?

1 Like

Thanks!

I have looked at versions and both Python and intake are the same.

Good idea about proxy, I will ask admins. As far as I can tell, I can use Julia and access internet files, packages, without problems.

What happens if you try this:

py"""
import fsspec
fsspec.open("https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml")
"""

Thanks!

It is able to connect I think :slight_smile:

fsspec = pyimport("fsspec")
catalogue = fsspec.open("https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml")

PyObject <OpenFile 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'>

Not sure what I should do with this object though?

edit - trying to open the PyObject it return the same error :

# julia
catalogue.open()
PyError ($(Expr(:escape, :(ccall(#= /gpfs/home/dl2594/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'FileNotFoundError'>
FileNotFoundError('https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml')

But still works in Python

# python
catalogue = fsspec.open("https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml")

catalogue
<OpenFile 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'>

catalogue.open()
<File-like object HTTPFileSystem, https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml>

I have looked at local definition of proxy and I can’t find any (there were commented). Still waiting for admins answers.

I tested at home and I confirm it worked. So, yeah, probably some sort of proxy problems.

Cheers!

Some more infos.

I was able to use the intake package from the REPL. The problem seems related to Jupyter Lab and a potential proxy problem.