I’m migrating some code from PyCall.jl to PythonCall.jl. While so far I’m loving the integration with CondaPkg.jl to handle the Python environment, I’m finding the lack of automatic type conversion quite cumbersome. I understand that it has some performance advantages, but previous lines of code that looked almost like pure Julia code when it fact it was Python, now they are flooded with pyconverts and other convoluted expressions to match the types.
I’m heavily relying on xarray, and for example this is a clear example of how much the code needs to be changed for this to work. It might be that I’m doing something wrong or being too naive, so please point me to a better solution if that’s the case:
Previous version using PyCall.jl
# `climate` is an xarray Dataset
# `period` is a Julia Date
if any(climate.time[1].dt.date.data[1] > period[1])
# do something
end
Current version using PythonCall.jl
# `climate` is an xarray Dataset
# `period` is a Julia Date
if any(pyconvert(Date, (pd[].to_datetime(climate.time.data[0]).date())) > period[1])
# do something
end
It is obvious that the version using PyCall reads much much better, virtually acting like Julia code. Is there a better way to write this type of code using PythonCall? Thanks in advance!
I managed to reduce it even more in this MWE. It looks better, but I guess there’s no way to avoid having to manually convert every Python object I use.
MWE:
using PythonCall
using Dates
np = pyimport("numpy")
xr = pyimport("xarray")
pd = pyimport("pandas")
# Run the Python code block
@py begin
np.random.seed(123)
times = pd.date_range("2000-01-01", "2001-12-31", name="time")
annual_cycle = np.sin(2 * np.pi * (times.dayofyear / 365.25 - 0.28))
base = 10 + 15 * np.reshape(annual_cycle, (-1, 1))
tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)
ds = xr.Dataset({"tmin": (("time", "location"), tmin_values),
"tmax": (("time", "location"), tmax_values)},
{"time": times, "location": ["IA", "IN", "IL"]})
end
# Create a dummy date for January 1, 2020
period = Date(2020, 1, 1)
# MWE
any(pyconvert(Date, ds.time.dt.date.data[1]) > period)
That latest code looks like good PythonCall style to me.
The reason why you need to do the conversion yourself (and why PythonCall doesn’t do automatic conversion) is not for performance (although the resulting type stability is nice), it’s because Python objects are just fundamentally different things to Julia ones.
For example if you do some_python_object.some_list.append(3) then it is annoying if the Python list gets converted to a Julia vector, which doesn’t have the append property.
So the current design makes the boundary between Python and Julia explicit. Indeed in your original post, I don’t know which of the intermediate objects in climate.time[1].dt.date.data[1] are Python or not.
The same logic is why we have pyconvert instead of adding methods to convert - we don’t consider a Python list to be “the same” as a Julia vector.
To reword this a little, PyCall makes Julian code work on (some) autoconverted Python objects and lets you opt into simply wrapped Python objects, while PythonCall instead makes Pythonic code work on wrapped Python objects and lets you convert them for Julian code.
But the instances before and after a convert aren’t usually the same in pure Julia either, and it does seem possible from the surface to move the pyconvert code to convert for the same calls and bonus auto-conversions in some circumstances. Could you elaborate on the semantic separation?
Yes, this makes sense. I was not aware of this different philosophy before starting the move. I mainly did it due to the implicit use of CondaPkg.jl.
My workflow has a lot of mixed Julia and Python code, and using PyCall.jl enabled me to use xarray in a really transparent way. I was acting on xarray objects but I could treat the matrices and everything using Julia syntax and performance, which is really convenient.