It’s still not clear to me what’s the point. People are entitled to be interested in taking care of whatever projects they feel like, unless they’re employed to maintain a specific package, which is not the case here.
Speaking for myself, I’ve never ever used the Julia to Python interface, because I never had the need to. I very occasionally use the Python to Julia interface for comparing results between Julia and Python packages. I totally understand your experience is different, but I have the feeling I’m also not alone in my experience, especially for people more invested in the Julia ecosystem than the Python one, which might explain why there seems to be a lack of interest.
At the risk of sounding entitled, someone in the Julia organization should be employed to maintain it, even if it’s just 10% of their time . The package seems like such an extremely good return-on-investment that it’s painful, as a Julia user hopeful for wider Julia adoption, to watch the repo collect dust. If I was in charge of some grants to open-source communities, this feels like the highest impact project to support.
As I said, my point is not about who is maintaining what, but why people are not being able to join efforts and share these maintenance costs in a common project. Hence, I feel the need to find out why and the need to try to improve the situation.
It is a shame that people from outside have to come here on Discourse to ask questions like the one asked in the title of this post. If PyJulia and PythonCall.jl are parallel ongoing efforts, then the READMEs could clarify why. If one of them is in maintenance-only mode, then the READMEs could mention that. Currently, we have independent efforts, one of which will probably die, and we are failing to communicate that to newcomers. More importantly, we are failing to build teams with common driving forces.
I entirely agree with you. Nobody who uses 100% Julia would have a use-case for this, and if that is who you interact with, it can be hard to see outside the bubble. But most of industry, at least in my area of machine learning, use Python and C++. If they are a well-established company, they would never rewrite their software stack from scratch: they will introduce new languages and libraries gradually, relying on interoperability. So, for Julia even to get its foot in the door there, it requires good interop with Python. If it can do that, it will receive the support from any new adopter - hence it is a great return on investment.
No, you just see this in the stage of a classic maintainer pass. I mean, look at pyjulia itself:
It had one maintainer pass before. Now before someone says it’s some weird Julia thing, look around. Here’s libuv.
It’s now passed to its fourth hands, some weird Jameson Nash dude. That’s a core cross-platform asynchronous I/O thing that all languages rely on… seems scary to have
Alright, but what about open source anyone here would know? Here’s OpenBLAS:
It’s passed three hands before, and most people never really noticed and instead called it stable.
I can keep going, but this happens all of the time. It’s nothing to be scared about, it happens all of the time with dependencies that everyone uses, and when I mean everyone I mean things like the SSL encryption used for the whole web and stuff have gone through similar stages. Most open source projects have 1 or 2 people really active on them at a given time, and they tend to pass hands when the first maintainer goes on to do other things. It happens across all languages, domains, and beyond. You can find a lot of nice papers about this that started researching it back in the Heartbleed era of open source.
But okay, we’re greedy programmers, what can we do about it?
Well, if there’s someone willing to pick up a given project, we could look for some infrastructure grant to support them. Finding funds isn’t too hard, there’s NSF CSSI, NASA ROSES, etc. we could write for one of these calls. I write these for the community and it’s a lot of time to keep putting out an average of a grant every two weeks, so if anyone else could pitch in that would be great. Writing some Python bridge into some grant is relatively easy and can probably just ride as pork into some other SciML thing.
But even after getting the funds, someone would still have to step up to do this work. That was Taka before . After Taka, I’m not sure who else is willing and has the knowhow to do this. If anyone knows anyone who wants to do it though, put them in touch with me and we can figure something out.
Probably the best thing to do is to get in contact with the PythonCall guy and see what his career stage and goals are. He’s of course interested in the topic, but what is he doing and where is he going? What could help him out the most? Did anyone even ask? That’s the first thing to do. While it is always a PITA to have to change some fundamentals like this, maybe he’d be willing to help diffeqpy and PySR swap to PythonCall, and maybe he would like to do it full time if only there was support. That’s a very concrete step.
Maintenance-only mode never really happens on purpose. It’s only identified years after the fact that a hobby you’d like to try is something you never picked up. It’s that next book on your shelf that you intend to read, and have intended to read for the last 5 years. It’s probably a good time to now call it in maintenance mode and someone in this thread should probably make a PR to the README asserting that, but it’s not like there was any intention for someone to disappear without putting it onto the README. When exactly it happened is a blurred line, and only clear that it has happened after the fact. I wouldn’t try to place blame on someone for something like that.
You’re way over generalizing here, like, to a crazy extent. If there’s something that have always fallen apart in every language I’ve ever used, it’s language bridges. Why are language bridges in particular so hard to keep open source maintainers for? Well I finally found out first hand when doing some of the MATLAB.jl maintenance. The problem is that language bridges are generally started and maintained by someone who’s working on projects between two languages, so they need it themselves. But what we’ve seen with the Julia language bridges is that, over time those maintainers end up with essentially 100% of their projects in Julia and so many of them tend to become core maintainers of the Julia language itself (since they already know some system programming stuff from building language bridges, it’s a natural transition).
So arguably it was easier in 2017 or so to find people building lots of new language bridges in Julia. That’s the peak of MATLAB.jl, mexjulia, juliacall (R), etc. PyJulia had its revival at around this same time, where there were enough libraries that people really started adopting Julia, but not enough to really be standalone. Nowadays, you can just do an entire project in Julia pretty comfortably, and many students are learning it as their only language, so the pool of people who are naturally building and maintaining such bridges shrinks. These days, a lot of the people with write access to MATLAB.jl probably don’t even have a MATLAB license anymore, I know I don’t .
Given that it is such a natural process though, it does make a good argument for a targeted correction to accelerate its development.
Hi @ChrisRackauckas thanks for the reply. Let me follow the same idea with contribution plots.
First there was pyjulia, as you showed it had one maintainer shift:
Then, this maintainer stopped maintaining pyjulia and started contributing to another project called PyCall.jl, where another maintainer shift happened:
This new maintainer then started contributing to a 3rd package called PythonCall.jl where a new maintainer shift happened:
Unlike libuv, why these efforts are happening at completely separate repositories? Why do we have to navigate this history ourselves to find out what is the latest effort?
Maybe we are bumping into gitology issues? People find it easier to start a whole new package in a whole new repo because it is harder to work in branches and merge into a main branch for a breaking release?
Thanks @ChrisRackauckas! Very useful to hear your detailed take on this topic in general and I agree with everything you said. Contacting the PythonCall dev about this topic is probably a great first step, and I would definitely support grants for this. For what it’s worth I can ask around at Simons Foundation (where I currently reside) to see if they also have grants for this type of thing, as I think it would be really valuable for the larger community (to bridge Python/Julia communities in science, but also to foster industry support for Julia).
PythonCall would not have happened in PyJulia. That’s pretty absolutely clear because the author, from the very start, posted exactly what is different at the bottom of the README. They very clearly had a different vision for how to do it, and want to restart from scratch to achieve a very different goal because they thought that design was better.
That happens all of the time in open source: libuv was not the first asynchronous I/O library: this kind of thing didn’t just start to exist in 2012. Nor will it be the last. It’s the one we have right now because no one has a better idea for how to do it.
I know that the Simons Foundation is interested in supporting language bridges because they have in the past. It would be good to find an appropriate call from them, and then find the right individuals for this. Definitely a good way forward.
The jig is up.
I’m not sure if it was entirely accurate for the PythonCall developer to say that JuliaCall is just “one [Python] file” (pysrc/juliacall) that has to “find Julia and get it to import PythonCall”; the Julia code (src/jlwrap) that wraps Julia types for Python does execute strings of Python code interpolated with computed Julia literals. But that just goes to show that it’s a vastly different approach to PyJulia, which is basically a Python package.
I’ll draw your attention to this comment by one of the developers of PyCall; I have not seen a similar comment from a PyJulia developer. TLDR: PyCall and PyJulia were created in Julia’s volatile package-lacking v0 phase, and a reasonable design overhaul for the different needs of today would be such a disruptive major revision, it warrants a different package.
Julia-Python interop feels one of the best bridges among all languages, it’s mostly transparent and has few issues - at least in calling python from julia. For example, I mostly use matplotlib for plotting, and the julia code looks basically the same as in python. It works with julia arrays and converts corresponding data types.
Is the opposite direction (call julia from python) much worse?
IMO a game-changer in Python interoperability is/will be StaticCompiler.jl. A feature was just added to it (two days ago!) to compile a Julia shared library. I am about to try it. While StaticCompiler.jl still doesn’t support all of Julia, it should open the door to call Julia code from Python with 0 latency.
This could lead to Julia packages gaining lots and lots of users (given the massive scale of Python). Eventually, some of those users could become contributors to the Julia package as well, in the same (or a better) proportion that they contribute to the C code which underpins most Python packages.
What are the big advantages of using StaticCompiler.jl in Python over running active sessions per language plus PackageCompiler.jl sysimages to reduce latency? I know that per-language sessions have memory overhead (especially Julia’s REPL), but it seems to be the standard approach for bridging dynamic languages like R and Python, and compiled Julia is already at its peak speed.
Well, you can already use basically all of scikit-learn using ScikitLearn.jl and matplotlib using PyPlot.jl -both of which use PyCall under the hood, and work quite well in my experience. Though these days I’m mostly using MLJ (which can be used with ScikitLearn.jl models) and Makie for plotting.
This may be true for calling Python from Julia. But, it’s missing the urgent need to do it the other way around. (That’s mentioned in this, and other, threads) Julia would benefit enormously if it were easier to deploy and also to call from Python. This is a big roadblock to adoption in many companies.
StaticCompiler.jl is promising, but is not generally useful at this point. To a certain extent, it allows to write C-style code in Julia.
There is a big advantage of PyCall over PythonCall as far as I can tell. PyCall sends julia Arrays to nupmpy arrays, and vice versa. You can often copy Julia code and it will run. There are some examples in diffeqpy. If you pass a Python list or np.array to a Julia function, PythonCall will wrap it in a Julia type that you can index into and iterate over. And viceversa. So you can sometimes use this without thinking. But, the access is super-slow, and you may not know why. It’s better to convert to a native type. Also, if you have, say, a python function that makes many Julia calls, this wrapping means you have to intervene between calls to convert the types. With PyCall very often it just works. And there are documented functions to convert, but they are super slow. I’m not sure, but I think they are using the wrapper’s indexing to build the new types. The PythonCall docs say it can do fast non-copying conversion of numeric types. I believe it. But, I have never been able to find out how to do this. It looks like PythonCall has better and more flexible wrapper and conversion capabilities than PyCall. I’m eager see how this works in applications, but I have not so far been able to understand it. There are related issues from others in the PythonCall repo.
Another thing is that PythonCall is a one-stop shop. It has all elements for using Julia in Python from the highest to the lowest level; so full stack. The API is in a way not flexible. So, it is difficult to use only parts of it. PyCall on the other hand offers APIs that let me use it in ways the authors did not envision. A few months ago the author added an environment variable (because he needed it himself) that allows to load
juliacall without initializing (which would automatically download and install several things without asking). This made it possible for me to use it in my projects. But, then I have to use PythonCall internals to initalize the way I need to, and this is of course fragile. PyCall’s
julia module does not initialize on startup and offers a flexible API for initialization. I can build, store, and load system images the way I want or need to. I can’t do this with PythonCall. Having said all this, the author of PythonCall does have plans to make everything possible; loading system images, combining Julia projects, other stuff. This will be great and very useful. Much more ambitious and capable than PyCall and with an elegant configurable interface. But he wants to control the interface and entry points in a way that makes you buy into the full stack. If I recall, it was in part for the usual reasons. Some unsupported way of doing things may become sort of a defacto standard and then limit design choices in the future, and cause more burdensome support. If you give people many ways to do something you invite chaos in the future. So in the future and I can load a system image, but not now. Of course, these are compelling reasons, and there is precedent supporting this design choice. It’s a perfectly valid and reasonable design choice.
I think I understand the vision of PythonCall, and trust that he can get there. It could be that with more docs and a bit more development, PythonCall will fufilll most needs. But, at the moment I’d like to see a more modular Python/Julia bridge system that serves a broader range of needs. Separating types and conversion from package management, for instance. That’s what I meant by an active dev community in dialog with users driving design. Maybe it involves PyCall and or PythonCall, or something else. OTOH, PythonCall is still developing, so I wouldn’t be surprised if I change my opinion in 6 of 8 months and find it has everything.
The last release of pyjulia was 0.5.7 on October 25th, 2021. The prior release was 0.5.6 on September 13th, 2020.
The way I would describe its current state is being under community maintenance.
The good news is that pyjulia is part of the JuliaPy Github organization. Which means that you could join this organization and become part of the team that is managing it. Upon doing so, you can be assigned merge permissions. Currently there are 15 people in the JuliaPy organization, seven of them are “public”. I am one of the 15.
Am I actively looking at the pyjulia repository? No. Will I take a look at your issue or pull request if you point it out to me? Yes.
I guess the main point is that the shared library gives you effectively zero latency, while you still have to initialize Julia to use a custom sysimage via PyJulia (which, for example, takes a solid 13 secs on my old laptop). This could be important in some use cases, like microservices, or for impatient people ;).
Using shared libraries is also a more modular approach than a massive runtime, I guess, that could enable a Python user to rely on several Julia packages created by different authors more conveniently.
It is a work in progress, sure, but I wouldn’t dismiss it so quickly. For example, StaticCompiler.jl is already compatible with LoopVectorization.jl, which consists of a series of lisp-style macros to produce SIMD accelerated code, so I would say it is already past mere C-style code.
I still need to test it more, though, but StaticCompiler.jl already looks like a great addition to the ecosystem. I’m sure it makes Python interoperability a lot better, when applicable.
Just to make this concrete, if anyone wants an invite to JuliaPy DM me and I’ll figure out how to make that happen.
If you want to draw my attention to a specific pull request or issue, link it here.