I can't use intake package with PyCall

Hello! I have the following code in Python that is working, but I can’t reproduce it with PyCall.

# Python code
import intake
catalog_url = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'
cat = intake.open_catalog(catalog_url)

Out[5]: main:
  args:
    path: https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml
  description: Master Data Catalog
  driver: intake.catalog.local.YAMLFileCatalog
  metadata: {}

In Julia, with PyCall (PyCall is using the same Python environment) :

using PyCall
intake = pyimport("intake")
catalog_url = "https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml"
cat = intake.open_catalog(catalog_url)

ERROR: PyError ($(Expr(:escape, :(ccall(#= /gpfs/home/dl2594/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'FileNotFoundError'>
FileNotFoundError('https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml')
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/intake/__init__.py", line 168, in open_catalog
    return registry[driver](uri, **kwargs)
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/intake/catalog/local.py", line 579, in __init__
    super(YAMLFileCatalog, self).__init__(**kwargs)
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/intake/catalog/base.py", line 110, in __init__
    self.force_reload()
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/intake/catalog/base.py", line 168, in force_reload
    self._load()
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/intake/catalog/local.py", line 609, in _load
    with file_open as f:
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/core.py", line 103, in __enter__
    f = self.fs.open(self.path, mode=mode)
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/spec.py", line 1106, in open
    f = self._open(
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/implementations/http.py", line 346, in _open
    size = size or self.info(path, **kwargs)["size"]
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/asyn.py", line 113, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/asyn.py", line 98, in sync
    raise return_result
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
    result[0] = await coro
  File "/gpfs/home/dl2594/miniconda3/lib/python3.9/site-packages/fsspec/implementations/http.py", line 420, in _info
    raise FileNotFoundError(url) from exc

Stacktrace:
  [1] pyerr_check
    @ ~/.julia/packages/PyCall/ygXW2/src/exception.jl:62 [inlined]
  [2] pyerr_check
    @ ~/.julia/packages/PyCall/ygXW2/src/exception.jl:66 [inlined]
  [3] _handle_error(msg::String)
    @ PyCall ~/.julia/packages/PyCall/ygXW2/src/exception.jl:83
  [4] macro expansion
    @ ~/.julia/packages/PyCall/ygXW2/src/exception.jl:97 [inlined]
  [5] #107
    @ ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 [inlined]
  [6] disable_sigint
    @ ./c.jl:458 [inlined]
  [7] __pycall!
    @ ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:42 [inlined]
  [8] _pycall!(ret::PyObject, o::PyObject, args::Tuple{String}, nargs::Int64, kw::Ptr{Nothing})
    @ PyCall ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:29
  [9] _pycall!
    @ ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:11 [inlined]
 [10] #_#114
    @ ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:86 [inlined]
 [11] (::PyObject)(args::String)
    @ PyCall ~/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:86
 [12] top-level scope
    @ REPL[8]:2

I also tried the following inside Julia, without success :slight_smile: (same error)

py"""
import intake
catalog_url = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'
cat = intake.open_catalog(catalog_url)
"""

Any information or solution would be welcome!

If you’re sure PyCall is using the same Python and intake version, the only thing I can think of would be some kind of firewall or proxy thing … does your system not give julia access to the internet for some reason, or are there some proxy environment variables you’ve set up for python but not for julia?

2 Likes

Thanks!

I have looked at versions and both Python and intake are the same.

Good idea about proxy, I will ask admins. As far as I can tell, I can use Julia and access internet files, packages, without problems.

What happens if you try this:

py"""
import fsspec
fsspec.open("https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml")
"""

Thanks!

It is able to connect I think :slight_smile:

fsspec = pyimport("fsspec")
catalogue = fsspec.open("https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml")

PyObject <OpenFile 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'>

Not sure what I should do with this object though?

edit - trying to open the PyObject it return the same error :

# julia
catalogue.open()
PyError ($(Expr(:escape, :(ccall(#= /gpfs/home/dl2594/.julia/packages/PyCall/ygXW2/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'FileNotFoundError'>
FileNotFoundError('https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml')

But still works in Python

# python
catalogue = fsspec.open("https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml")

catalogue
<OpenFile 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'>

catalogue.open()
<File-like object HTTPFileSystem, https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml>

I have looked at local definition of proxy and I can’t find any (there were commented). Still waiting for admins answers.

I tested at home and I confirm it worked. So, yeah, probably some sort of proxy problems.

Cheers!

Some more infos.

I was able to use the intake package from the REPL. The problem seems related to Jupyter Lab and a potential proxy problem.

It’s great to find this. I also have this FileNotFound problem intermittently, prohibiting using intake with PyCall (or PythonCall). Once PyCall fails to read one url, it does not read any other urls.

intake and xarray work in native python.

Julia can read data from urls, e.g.

HTTP.get("https://raw.githubusercontent.com/eurec4a/eurec4a-intake/master/catalog.yml")
1 Like