And that is where we get lost in the specifics of the example @StefanKarpinski . We are not mentioning CSV.jl as the problem. It is a general problem with the ecosystem. Lots of duplication and lots of packages developed in parallel because of lack of transparency regarding similar efforts.
We can do better to (1) help developers identify a common cause and (2) help users find the most stable and maintained option.
Those are articles benchmarking performance. Not basic usage.
The difference is that people only look around to alternatives if things arenât fast enough and most of the time it doesnât matter. If you are just loading something into pandas and learning the language, you donât even think about it or care. Only advanced users would even need to google csv loading since all basic usage just says import pandas and then doesnât think twice about looking for other packages.
But it is for basic stuff! Try to do interpolation, plotting, loading files, manipulating data, (let alone things like neural network) etc. in matlab or python vs. julia. In the other languages you just donât have to think or worry about compatibilities. You donât even shop around for packages at all. You import numpy, scipy, pandas, matplotlib and you have almost everything you need. Maybe you import json or something like that, but the âbaselineâ package is obvious. In matlab you donât even need to figure out the numpy/scipy/pandas/matplotlib combo because it is built in.
After you get past the basics, then I agree that the languages are not all that different for searching for packages.
Figuring out how to automatically determine which packages are related to each other would certainly be cool and useful. It seems like a moderately challenging data mining/NLP problem though. An initial approach would be to do some clustering on tf-idf between README documents. As someone who used to do this kind of data science to help people find things on marketplaces (Etsy, specifically), I do suspect that this is harder to get right than one might naively guess.
Yes, this would be just used as a warmup. The real feature we need in JuliaHub is the ability to update these links as a community. If someone goes to the CSV.jl package on JuliaHub, we could show a dropdown menu for logged users to select similar efforts. These would then be reviewed and accepted/rejected.
I think part of the issue here is that in an extremely well-established language like Python, there are large organizations with deep pockets that pour resources into a few high-profile packages in key problem areas. Itâs not there is less duplication (how many re-implementations of numpy are we up to by now?), itâs simply that it is easier to identify a well-resourced solution.
And yes, âJulia is not as well established as Python, does not have as much corporate and institutional support, and does not have as many developersâ is a perfectly valid criticism! ⌠but this observation does not lead to much productive discussion.
Are there ways we could improve things like search tools? Yes! Are there areas where it would be useful for people to write more documentation? Definitely!
But âbe more like Pythonâ == âget more resourcesâ is not really actionable. (Nor is the monolithic Matlab model of âbundle âofficialâ versions of everythingâ a productive approach for decentralized free/open-source development.)
I apologize if the following is a little ranty, but I really dislike discussions about âsomeone should do a thingâ when itâs not at all obvious who that should be.
Maybe Iâm a different kind of programmer due to not being âformally trainedâ as a data scientist, but none of those are obvious to me. I literally only know theyâre the go-to package because someone told me to use them in a course I once took. Theyâre also not easy to use - I remember being very frustrated with having to use (to me at the time) arcane syntax and having to fit my code to the @ sprinkled paradigm, just so I could have my code finish in time. Itâs not obvious AT ALL that I have to do this to make it work.
Like, I get that itâs easy to say âlook! python has all this nice ecosystem that everyone is using!â, but did it occur to you that this has been a deliberate effort by the python community to make it so? And that this has not happened over night and has had development behind it for YEARS by now? Iâm not joking with this, the original python philosophy was in part inspired by UNIX philosophy after all, to have one obvious way of doing things.
I do understand that you want something like that in the julia ecosystem, but going on and on about how you want it to be a thing and complaining that it isnât does not help you in achieving that goal of yours.
Who, exactly, is supposed to review that? JuliaHub is a commercial offering. Do you want to have unpaid volunteers do that review work? Why should I volunteer for this?
This, to me, is the core problem with all this discussion about âsomething should be done about X!â - it completely ignores that someone actually has to do that thing and this may not be something that people are willing to volunteer for, because itâs hard work that people would very much like to get paid for. Youâre free to donate your free time to do that, instead of arguing that someone else should do it for you. I for one wonât - Iâm much happier fixing little bugs & doc issues I notice in the julialang repo.
Youâre presuming that someone already knows pandas and how to load a CSV file in Python. If they already know that, then they donât need to search for how to load a CSV file. My example shows that for that simple example that keeps getting mentioned here, (1) the search results for Python actively lead you down the wrong path (csv library) and (2) the Python ecosystem is fractured and confusing (csv vs pandas vs PyArrow), whereas the search results for Julia lead you directly to the right approach with a clear and working tutorial of very high quality. There are no benchmarks in any of the google search results in either language, so Iâm not sure what thatâs in reference to.
Perhaps CSV parsing is not a good example and there are other things that are hard to find the right way to do in Julia. If so, some concrete examples would be helpful, since otherwise this discussion is kind of abstract. What, specifically, have people had a hard time finding. As another common example, I just googled âjulia plottingâ and the top result was the Plots tutorial, which seems like a great result. The tutorial code works flawlessly and installing all of the Xorg graphics stack took less than a minute, and worked perfectly. I was able to install matplotlib using pip3, but I have not figured out how get it to actually show a window with a plot in it.
I would happily review as someone that is part of the community and wants it to succeed. This is usually a single scan in the README followed by a button press accept/reject.
There is no problem with updating a commercial offering, we are not coding anything, we are just providing feedback to an already implemented system. Fixing the database when it is missing something.
To clarify, we are asking for a simple feature: a dropdown menu where logged users could select similar packages whenever they are reading a package page.
The work afterwards: logged users could propose links with the menu and other users could review the links to accept/reject. This is not that much work if you imagine that the community will be willing to help.
100% agree. scipy/numpy/matplotlib/pandas combination as a low-bug default emerged through massive investment and they didnât start out as the âgood enoughâ standards from the beginning.
This entirely explains why the experience is so different, but not what to do about it.
I agree, but at this point there are still a lot of people who donât seem to recognize that it is even different at all - which means it is hard to fix. I assume it is because they are using Julia mostly on its own, or they use both languages in different ways where they donât run into the same basic usage issues. A Julia-specific solution to the problem needs to start by agreeing there is a problem in the first place, and then whether it is worth addressing.
Exactly. Julia cannot forge the same path (you canât magically make organizations with deep pockets appear), but if people agree that there is an issue then they can decide whether it is worth addressing.
I donât think that package discovery is the entirety of the problem. Even if you can discover the packages, the features are often insufficiently overlapped so you end up having to choose not just on the goal (e.g. interpolation) but also on the specific features (e.g., regular vs. irregular grid, dimensionality, etc.).
The best approach I have seen so far for dealing with this in a decentralized language is the SciML approach. Get everyone using wrapper packages as the default âno thinkingâ baseline, then those packages can do integration testing with downstream packages and decide when they are sufficiently solid to wrap. If one downstream dependency is buggy or has incomplete feature coverage they can swap them. Otherwise, people can use the direct packages as they evolve and experiment as they wish, but intro users can keep things simple.
That is how everyone learns it. But in python the information conveyed is: import pandas, numpy, matplotlib, scipy and you have everything you need.
It isnât possible to get to that point for julia, nor is it necessarily a goal, but it is important to recognize it is different.
I have dedicated an enormous amount of personal effort in education, evangelizing the language and packages, and have funded open-source projects and summer of code students for years to try to contribute to this issue.
Rant away, but do so understanding what others have tried to do here.
Wrappers have downsides as well, because they tend to enforce a âlowest-common denominatorâ API which can be limiting, especially as the problem domain becomes more complex. It also kind of puts the cart before the horse â it is much easier to develop a good wrapper API after you have multiple high-quality competing implementations.
For example, if we had somehow decreed 5 years ago that all finite-element packages in Julia should follow a common âFEMwrapperâ API (e.g. based on JuliaFEM), that would have prohibited innovations like Gridap.jl.
50% agreed. Itâs moreso that there are some domains that are better for wrapping, other domains where itâs much harder. Linear solvers, optimization, differential equations, etc. getting all of those uniform, using the same keyword arguments, and all being efficient on the same interface isnât bad at all. PDEs in general⌠yeah thatâs hard to put a single interface to, so we welcome the fact that people will build all sorts of PDE solvers and weâll try to incorporate them all into one really smart symbolic system.
I think the ecosystem would benefit from making this kind of resource more visible, e.g. by hosting it or linking it on JuliaHub. Râs CRAN has a link to their task views on the home page.
Itâs not a problem with a single package, no, but it sure doesnât scale. What if the package only has a sparse README? What if you are not a domain expert and canât decide whether a package is âgoodâ for its field? Would you feel comfortable recommending a package about something outside of your field of expertise or outside of what you personally use? As a concrete example, would you feel comfortable recommending & linking packages related to cryptography? I think weâre going to run into âIâve reviewed all packages I care/feel knowledgeable aboutâ much sooner, rather than later.
The pond is much bigger than you think.
Who is to update that? Who is to maintain the quality of recommendations? The links between packages? What happens when a package is no longer maintained, but still highly recommended? How much work do you estimate would it be to periodically scan the registered list of packages for packages that were once, but are no longer recommended? Youâre speaking in very abstract terms about something neither you nor I have internal knowledge of how to actually pull off and support long term. All I can bring to the table in terms of relevant experience is moderating a few small scale forums (think less than 50 people) and accompanying wiki, but that was already a support nightmare and just sucked up waaay more time & resources than was practical.
I donât have any stats on that, so admittedly Iâm talking out of my subjective POV, but Iâd wager there arenât actually that many people logged into juliahub, compared to the number of general search âusersâ. Even if thatâs the case, the vast majority probably do not want to spend their spare time reviewing random packages. So weâre now speculating about whether the community would be willing to help - with what concrete action? Who from the community is to do that work? Are we all expected to donate, say, 2 hours each week to this effort? What if I donât want to do that - am I now shunned because obviously Iâm not participating in this particular effort to curate a commercial offering made available to the community by a company? I do find my time well spent actually improving documentation & fixing issues, thank you very much.
You do realize that from my (and lots of other python usersâ) POV, that ecosystem is a niche? Further, whatâs to stop us from doing the exact same thing, by writing blog posts about how people should use X in julia? In fact, that is exactly what is happening!
Why not? To a lay person in that field like me, the SciML ecosystem seems to be exactly what Iâd want for that, and thatâs just from what I picked up through osmosis on this forum and slack.
I do understand it, I really do. I myself am known to be the âjulia evangelizerâ in my circle. I recognize that you have done the funding and evangelizing as well, which is great! But please also recognize that weâre not all able to do it to the same level, be it due to financial or other reasons. Iâm in no position to fund GSoC students - just two years ago, I applied myself after all!
However, none of that takes away from the fact that talking about a thing does not make that thing happen. Saying that someone should do X, does not make thing X closer to a reality. Saying that X should be changed to do Y does not change X to do Y, nor does it say anything about how it can be sustained long term. Encouraging people to step up and do work for free is not a sustainable model - there are enough case studies in open source to prove that.
Iâm really trying to understand what exactly should be done and most importantly, WHO should do that. The general vibe Iâve been getting from this and the other recent discussions has never been a concrete âI want X to be a thing, so Iâm making it a thingâ. Itâs always been getting someone else to do a thing, seemingly without considering what it would require from them to do that thing and continue to support it (which is actually the hard part, as Iâm sure you can attest to as well). Maybe Iâm wrong about that, but thatâs the impression I got.
For sure. For some you can slowly build up coverage where people deviate to use the raw interface as necessary. The big concern is that you need to think ahead in the interface design if you want to slow-roll features (e.g. if you start with unconstrained optimization and slowly add in box constraints, nonlinear constraints, complementarity, will it break existing interfaces). But breaking interfaces is probably better than the alternative.
I agree. I think there are some more amenable to others, and anythign DSL-y will end up with a failure.
But part of this is a community thing. I worked with quantecon to fund some of the work on the consolidation of wrappers (and sciml people did the vast majority afterwards) but it hasnât yet become a rallying cry in the community. If it did, and people saw it as part of the solution they could contribute to, then it could make progress very quickly.
I should be clear here in that Iâm not suggesting that we shouldnât ask for JuliaHub to support something like this. I just donât think itâs sustainable, AT ALL, to ask community members to freely donate their time to make a commercial offering by a company better, even if that company is founded solely for furthering julia & selling julia-based products. Hence, to me at least, the idea of a âcommunity curated set of packagesâ immediately falls flat on its face, especially once you start looking beyond the ânumpy, scipy, etcâ equivalent-in-julia bubble and start moving into (to you) niche topics.