PythonCall.jl style regarding type conversion

I’m migrating some code from PyCall.jl to PythonCall.jl. While so far I’m loving the integration with CondaPkg.jl to handle the Python environment, I’m finding the lack of automatic type conversion quite cumbersome. I understand that it has some performance advantages, but previous lines of code that looked almost like pure Julia code when it fact it was Python, now they are flooded with pyconverts and other convoluted expressions to match the types.

I’m heavily relying on xarray, and for example this is a clear example of how much the code needs to be changed for this to work. It might be that I’m doing something wrong or being too naive, so please point me to a better solution if that’s the case:

Previous version using PyCall.jl

# `climate` is an xarray Dataset
# `period` is a Julia Date
if any(climate.time[1].dt.date.data[1] > period[1])
     # do something
end

Current version using PythonCall.jl

# `climate` is an xarray Dataset
# `period` is a Julia Date
if any(pyconvert(Date, (pd[].to_datetime(climate.time.data[0]).date())) > period[1])
     # do something
end

It is obvious that the version using PyCall reads much much better, virtually acting like Julia code. Is there a better way to write this type of code using PythonCall? Thanks in advance!

2 Likes

Can you post a MWE please (i.e. include code to construct climate and period)?

1 Like

I managed to reduce it even more in this MWE. It looks better, but I guess there’s no way to avoid having to manually convert every Python object I use.

MWE:

using PythonCall
using Dates

np = pyimport("numpy")
xr = pyimport("xarray")
pd = pyimport("pandas")

# Run the Python code block
@py begin

np.random.seed(123)

times = pd.date_range("2000-01-01", "2001-12-31", name="time")
annual_cycle = np.sin(2 * np.pi * (times.dayofyear / 365.25 - 0.28))

base = 10 + 15 * np.reshape(annual_cycle, (-1, 1))
tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)

ds = xr.Dataset({"tmin": (("time", "location"), tmin_values),
                 "tmax": (("time", "location"), tmax_values)},
                {"time": times, "location": ["IA", "IN", "IL"]})
end

# Create a dummy date for January 1, 2020
period = Date(2020, 1, 1)

# MWE
any(pyconvert(Date, ds.time.dt.date.data[1]) > period)
1 Like

That latest code looks like good PythonCall style to me.

The reason why you need to do the conversion yourself (and why PythonCall doesn’t do automatic conversion) is not for performance (although the resulting type stability is nice), it’s because Python objects are just fundamentally different things to Julia ones.

For example if you do some_python_object.some_list.append(3) then it is annoying if the Python list gets converted to a Julia vector, which doesn’t have the append property.

So the current design makes the boundary between Python and Julia explicit. Indeed in your original post, I don’t know which of the intermediate objects in climate.time[1].dt.date.data[1] are Python or not.

The same logic is why we have pyconvert instead of adding methods to convert - we don’t consider a Python list to be “the same” as a Julia vector.

4 Likes

To reword this a little, PyCall makes Julian code work on (some) autoconverted Python objects and lets you opt into simply wrapped Python objects, while PythonCall instead makes Pythonic code work on wrapped Python objects and lets you convert them for Julian code.

But the instances before and after a convert aren’t usually the same in pure Julia either, and it does seem possible from the surface to move the pyconvert code to convert for the same calls and bonus auto-conversions in some circumstances. Could you elaborate on the semantic separation?

Tongue-in-cheek reply: if you want a more magical and surprising mashup of Julia and Python syntax maybe Python.jl is for you:

1 Like

Yes, this makes sense. I was not aware of this different philosophy before starting the move. I mainly did it due to the implicit use of CondaPkg.jl.

My workflow has a lot of mixed Julia and Python code, and using PyCall.jl enabled me to use xarray in a really transparent way. I was acting on xarray objects but I could treat the matrices and everything using Julia syntax and performance, which is really convenient.