How to know if a package is good?

ChrisRackauckas · June 3, 2022, 6:47pm

I’m not totally agreed on the latter part. There are some domains where flexibility is key. I don’t think you can do a good non-DSL interface for general PDEs. I think it has to be symbolic in nature, hence building it on ModelingToolkit. Gridap, Trixi, MethodOfLines, NeuralPDE, etc. are all too different and need much too different of a specification of the functions to be optimal. Using all of those from one interface needs to be symbolic in some form, like ModelingToolkit or lower level like FEniCS. Also acausal modeling of DAEs needs something DSL-y because many times the best formulation of a DAE to solve is not something the user would write down, so you need to do code transformations automatically (and doing this in the compiler is hard to say the least).

But there are a lot of very fundamental and well-defined numerical routines that can have some clear interfaces, like SciPy covers (i.e. SciML), but also a lot of statistics.

jlperla · June 3, 2022, 6:50pm

I think you are missing context on existing investments, resources spent, etc… To me:

Step 1: admit that there is a problem.
Step 2: decide if it is worth solving.
Step 3: see if there is a feasible solution.

I still see a whole bunch of people who haven’t made it past step 1, which is why I feel the need to respond here. Maybe it is a lost cause and I am giving up on that now since I have said my piece.

As for step 3: personally, I have funded or secured funding for undergrads and grad students to try to contribute in ways that I think are important, but it hasn’t been enough and doesn’t resolve issues with continued maintenance. The only idea I have is to follow Chris’s lead with SciML, but maybe there are others.

So my suggestion to you is that if you move past step 2 that I bet there are a whole bunch of places that wrappers could be built up - if the community decides that is a decent approach to make a dent in the issue.

Personally, I am pretty skeptical of the JuliaHub as a solution through package discovery myself but I could be wrong. The first-order issue is that the packages are buggy or have incomplete coverage of features so that you can’t easily choose a simple good enough baseline. Better search tools won’t fix that.

juliohm · June 3, 2022, 6:52pm

@Sukera regarding the comments you left above, I think you are confusing the requested feature. We were discussing a feature where the community could suggest links between packages. Most of your argument is discussing a recommendation system.

juliohm · June 3, 2022, 6:56pm

I am 100% with you @jlperla , it is difficult to see that (1) is not recognized as a real issue even when multiple people who deal with students on a daily basis share a similar experience.

Let me emphasize here that not everyone is teaching at MIT where most students are above the average. We are talking about institutions all over the world where any guidance you give to students to facilitate their learning experience is welcome.

lmiq · June 3, 2022, 7:17pm

The MIT course “Introduction to computational thinking” was taught here at Unicamp and went very well.

I have taught courses four times here for chemistry students with no programming experience and it was also fine.

I don’t know which is the class of people that is expected to find the packages and solutions you deal with, but I never saw any course working without a good tutorial. In programming or anything else.

I don’t think that any other magical tool exists. So I’m on the side of the ones that don’t really see a problem.

Sukera · June 3, 2022, 7:23pm

No, I agree with all of this. I’m spending way more time than is good for me (& the things I should be doing instead) answering very basic questions on here, on slack and on zulip. I’m just as frustrated as you are when it comes to bad introductory documentation, the mangle of “tutorial & technical documentation”-style main julia docs we have now, the lack of official intro to the language other than “read the docs, it’s very readable” and the lack of domain specific introductions to the respective ecosystems. I feel that pain with you, because I have to listen to the same complaints every time I suggest that someone could do X in julia more efficient/performant/faster. It’s very hard to evangelize against ecosystems as mature as python’s are and I personally do not feel like I can just up & rewrite whole sections of the documentation.

I just don’t think that “asking the community to step up” is a feasible solution, as you put it - not for recommending packages, not for linking related packages. The way to improve the situation is to fix things, make documentation PRs etc. To me, “Not knowing about related packages” because the main one is not quite the thing you want is a symptom of a lack of documentation & polish, not a direct issue to be fixed.

I agree with all of that! I’m extremely frustrated by the lack of documentation in e.g. packages, and the seeming lack of interest in improving that front from core devs (the various discussions about making it better & easier to mark what is considered API in Base comes to mind - or, for a more concrete example, the still non-existent documentation about per-field atomics, which literally only have links to a technical design document).

I could/should take my own advice of course and just make a PR to make it easy to improve that situation - were it not to hinge on the fact that the people already working on a package/Base would have to start using that tool voluntarily or write the docs themselves, without having to wait on someone from the community to “step up”, figure out how it works and then write the docs post-fact. I can’t (and don’t want to) force them to use it. To me, that’s the core issue here, that we can’t solve by complaining, I don’t think. If you want to complain, complain about (or fix!) specific issues, but please don’t just say “the general sentiment is bad”, because I think quite a lot of people agree on that front, they just don’t think it’s worth saying that out loud.

No, I don’t think so. Be it a recommendation system or “just” linking packages (which are the exact same thing, just with different UI), at the end of the day someone has to sit there and go through the submitted recommendations/linkages and review them. If you don’t, you may end up with bogus links or recommendations (or outdated ones, which you have to review periodically). There’s no way around that, and that sort of “boring” work takes continued support and investment - it’s not work you can offload to “the community”, practically speaking.

Neither am I! I’m not MIT based, I’m not even US based. I have done my fair share of handholding of students through their very first semester of programming, teaching Java. I’ve assisted in teaching a python based data-science course, where people ran into the most common foot guns in python as well. Of course I got frustrated with java & python in these situations, because year after year they were the same exact fundamental issues. I too complain at every opportunity about the lack of teaching ressources for introductory courses and I too hate the fact that on some level, it’s a “do or die” world way too soon. None of this changes though that complaining about the state of things doesn’t make the state of things better (otherwise I would actually have caused things to change in the time at my faculty, which is oh-so-resistant to change).

jar1 · June 3, 2022, 7:24pm

Any chance the package search website could be made open source and more community controlled?

It could have lots more information about packages, such as

maintenance statistics like pull-request time-to-merge
package recommendations

Sukera · June 3, 2022, 7:31pm

Sadly, here we run into the classic issue of open source - who pays for the hosting? At the moment, JuliaComputing just does that as part of their commercial offering of their products, as far as I’m aware. Replacing that with a community controlled offering is non-trivial…

jar1 · June 3, 2022, 7:42pm

I’m not sure if anybody is actually willing to put in the effort to add more useful features to the website, but if people are, it would be up to JC to decide if it can afford to share some control of the package discovery tooling as they’ve done with other parts of the ecosystem.

Albert_Zevelev · June 3, 2022, 8:00pm

@juliohm Is it possible to change the title to:

How do I know if a Julia Package or Julia Tutorial is good?

Or is it better to create a separate issue for tutorials?

When I was learning Julia I mostly used Google search. Some tutorials (& other content) was great & updated, many were not. It was also often hard to find the good tutorials.

I think the Julia website (Get started with Julia) tries to point users to a “curated” set of tutorials.
I’ve seen so many other great tutorials that are not on the Julia main website.

Also, I can’t find easy links to various Julia Cheat sheets: 1, 2, 3, 4 …

juliohm · June 3, 2022, 8:08pm

@Albert_Zevelev the post was split off from a thread, I didn’t pick the title. Will try to update now

mbauman · June 3, 2022, 8:09pm

Isn’t this entire thread about packages? Let’s please keep the issue focused. Yes, tutorials may form a part of the solution to package discoverability, but it’s about discovering packages, no?

juliohm · June 3, 2022, 8:14pm

Makes sense @mbauman now that you mentioned. I will rename it back to packages to avoid more confusion. @Albert_Zevelev feel free to start a separate thread and link to tthis one

Benny · June 3, 2022, 9:05pm

All due respect, there are a couple disconnects here:

“Basic” isn’t a predictable standard, it’s a reflection of community demand in particular contexts. There just happens to be a large demand for scientific computing and data science in academia and especially users of those languages. The “basic” stuff for game developers or embedded programmers are completely different.
It is actually harder to find packages in Julia than the other data-sciency languages, just not with respect to browsing and searching. It’s the fact that Julia can easily be more composable, so it’s easier for developers to work independently yet build on each other’s work. This is a good thing, it fosters innovation and you see awesome tools show up with impressive speed. On the other hand, people do spend some time crawling forums to pick up their set of tools.

SciPy was brought up as a counter example, but there’s a bit of nuance. Yes, it’s easier to say “SciPy” than to list 7 Julia packages. A notable difference from Julia: SciPy is a superpackage, so import scipy doesn’t import all its subpackages. You still need to read sections of the SciPy documentation’s User Guide to figure out what tools to import, which serves the same purpose as the package ecosystem review blog I imagined earlier but on a SciPy-contained scale. To use an analogy, it may be easier to pick up a Swiss Army knife than to assemble your own toolbag, but either way, you need to learn and practice with each tool, which can respectively be facilitated by a sectioned Swiss Army knife manual or an organized bundle of separate manuals.

The thing about a Swiss Army knife is that it’s hard to add or remove tools to it. The only reason SciPy is even 1 thing is how much less composable practical Python is. Sure, other libraries can use NumPy arrays and SciPy functions, but they can’t extend important internal workings and need their data structures to be convertible to NumPy arrays. When you had to play by SciPy’s rules, it made a lot of sense to contribute to SciPy. Interestingly, NumPy has been developing multimethod dispatch (to name a few good readings: NEP 13, 18, 35, 37), which non-NumPy array libraries like the CUDA-based CuPy have been using. But it’s not nearly as neat as in Julia, and there’s a lot less generic interfaces or code reuse. To be fair, Julia’s reusability has its limits e.g. GPU arrays needed their own GPUArrays.jl interface, but in NumPy it’s like if each method is annotated only with concrete types.

rikh · June 3, 2022, 9:56pm

Let me answer the original question. How to know if a package is good:

Look at the number of stars
Look at the amount of recent activity while taking into account the fact that some packages need less maintenance
Look at the number of serious open issues
Read docs and see if the documentation matches what you need
Look at the core contributors

This is basically the same process as in any other language. You struggle a few weeks in base R and go to tidyverse because people say that will solve all your problems (and then you still struggle because it‘s R; oh wait I‘m digressing). It’s healthy to have some options and competition IMO

johnmyleswhite · June 3, 2022, 9:59pm

I would find it quite helpful if this thread focused on specific changes that could be made to Julia Packages or other parts of that website. Is the desire to have a certification process? To have more categories?

jar1 · June 3, 2022, 10:02pm

JuliaHub has a more informative search results page

compared to

It could also include all of these things:

I also think issue-response latency and PR-merge latency are important metrics, which could be displayed.

Testing and documentation metrics could be displayed prominently.

StefanKarpinski · June 3, 2022, 10:34pm

Please stop acting as though other people are dismissing this or don’t care about newcomers having a good experience. People here are engaging and discussing in good faith; that attitude is wholly unwarranted. I do happen to disagree that for core data science tasks it’s hard to find what Julia packages to use or how to use them. I’ve demonstrated that by doing what a student would do when trying to accomplish a few tasks: googling CSV reading and plotting and trying the code that I find in the top results. I was actually pleasantly surprised by how well that went. The top results were good (better than for Python!) and the code all worked. I did not cherry pick those tasks, they are the exact tasks mentioned here repeatedly.

Apparently interpolation may be an issue because the most prominent Interpolations package doesn’t do all the things one might need? I don’t know about that partly because interpolation is something I’ve hardly ever had to use. The Interpolations package does, however, link prominently to several alternative interpolation packages. Maybe something else can be improved there.

Painting anyone disagreeing with you as not caring about the newcomer experience is pretty disingenuous and no way to have a productive discussion. No one is taking that position. You’re claiming things are bad and hard to use. I’m disagreeing and presenting evidence to back up my position. But everyone thinks that making things better is a good idea, even if they’re already pretty good.

So what can be done to make the newcomer experience better? Since everyone agrees that it’s a good idea to do so. Several things have been proposed here. One is that Julia Computing should open source its JuliaHub platform. That’s not going to happen for reasons I feel like I shouldn’t have to justify. It also wouldn’t be good for the language since that commercial platform is how a huge amount of open source Julia work is funded.

It has also been proposed that features be added to JuliaHub search, specifically linking to related packages. That’s certainly possible, although I have to confess that it’s a rather low product priority. It would be easier if someone wants to build an open source system that computes package similarity. Integrating the output of that into the search results would certainly be doable. I suggest that you build that and then we can see how good the results are. If they’re good, they can be added.

StefanKarpinski · June 3, 2022, 10:36pm

If someone wants to develop open source tools to compute these metrics we’d be happy to include them in JuliaHub search results.

jar1 · June 3, 2022, 11:00pm

I wasn’t asking about open sourcing the JuliaHub platform, just the package search and discovery website.

That said, as was pointed out, Julia Packages already exists, and can be improved by community effort, though it seems a bit of wasted effort to have two parallel websites on the same task.

Topic		Replies	Views
ANN: JuliaHub — explore, run, scale Package Announcements package	34	3664	June 21, 2020
Juliaobserver.com 🔎 a package browsing tool for julia Community announcement	20	2866	February 19, 2017
Idea: "MapOfJulia.jl" Discover and visualize the Julia ecosystem as a graph Community packages , community , discoverability	3	467	April 12, 2023
A suggestion for package maintainers Community package , suggestions	2	879	August 2, 2020
OpinionatedJulia Package? Community	16	1255	October 5, 2023

How to know if a package is good?

Related topics