It’s a bit of a pity that, as far as I know, as of now, none of those models is in idiomatic Julia …
You make a very valid point, but the registry could just as well include information about moving.
Eg if the user tries to instantiate a package with a given UUID, the package manager could inform the user that this package has been relocated to the registry GeneralArchives
, which should be added (this should be offered automatically). I see this just an interface issue that can be handled.
We don’t need to sacrifice reproducibility at all, it’s not like we are trying to erase a package from history. It is just moved to another registry, a hallowed burial ground of packages that will remain available forever for code archeology.
Would it be possible to introduce some automated checks for the newly registered packages, that:
- Package has a test suite.
- Package has some documentation. Maybe relate the extent of required documenation to the code size. At least a description of what the package does should be provided in all cases.
- Each exported name has docstrings.
I don’t think it is asking too much from the package authors, but if some of it is difficult to implement (e.g. tests for interactive packages), then it can be resolved by a human.
Sure, the transition has to look like: General renames into “uncurated”, and current levels of arxiv / AUR -style review are initially maintained.
A new subset with stronger requirements / governance is created. Possibly the same mechanism is used for stdlib, with even stronger requirements.
Then, in due time, we elevate lots of widely used existing packages to “curated”. Importantly, “curated” packages should not be able to depend on uncurated packages.
Ideally, we soon reach a tipping point: Most packages most people use in their day-to-day lives (including transitive deps!) are curated.
Most people sometimes need to use a few uncurated packages, just like ordinary linux or mac users sometimes need to use AUR-equivalents or manually build software from source because their distro doesn’t ship it in a centralized way (or download some RPM / binary from a commercial vendor).
This is not an indictment of the uncurated package. For example, it is advisable on archlinux to skip the distro-provided builds of julia, due to different philosophies on use of almost-compatible system libraries vs vendoring with julia-specific patches.
Similar to arxiv / journals: Using arxiv results is totally fine to trust and cite if you read and checked the paper, or know the authors, or paper/author/institution have name-recognition; but an arxiv-only world without any journals or formalized institution of peer-review simply has worse scaling behavior.
And similar to arxiv / journals, one of the big differences between “curated” and “uncurated” would be where the buck stops: In a curated registry, the curator ultimately owns all package names and can fork from upstream as deemed necessary.
This does not fix the other issue that a split registry addresses:
There is value in having convenient access to somewhat crappy packages.
There is also value in having a curated selection of packages that you know are not crappy without having to review the code or having to know the author’s reputation.
These two goals are in conflict. Instead of some uneasy compromise that achieves neither, we can just have both.
This would be a very good requirement.
When reviewing a human-written thing, one at least knows that one makes the world better by giving good advice that the author can learn from, and there is an implicit social contract / economic equilibrium (you spent a lot of effort writing that, so I can spend a little effort reviewing it).
On the other hand, giving a human AI-slop to review without disclosure is an extremely hostile act. It breaks a social contract, and it breaks the old economic / game-theoretic equilibrium. It attacks the human dignity of the reviewer.
It is like spam: The correct answer to spam is not to faithfully engage, but rather to scorched-earth block the spammer and exclude them from the community. (unless you have too much free time, then 419eater-style responding in bad faith is even better at shifting the economic / game-theoretic equilibrium back)
That requires new code— how will a Julia 1.0.5 user get this UI?
Note this “user” may be a docker container or archive or reproduction script (with manifest) in a paper or …
How would that address legal issues?
It occurs to me that there is one major feature missing from the Julia package manager that others (conda, pip, etc.) have, and that is a built-in search
. How do people discover packages currently? Well, I go to the juliapackages.com website and search. There are even sorting options. What if users could do that with Pkg
?
Something like
pkg> search tensor # search all package names/readme files for the word "tensor"
# it could even add a sort feature like the website!
pkg> search tensor --sort github_stars
If a hypothetical pkg search
feature defaulted to sorting by Github stars (or # of downloads), then users are (hopefully) steered towards higher quality packages and the need for a curated registry may be mitigated somewhat. Also, vibe coded packages would automatically appear at the bottom until enough users gave them stars/downloads, which (hopefully) would show they are decent.
Also this functionality already exists in the juliapackages.com website! So maybe it wouldn’t be terribly difficult to get it working in Pkg
(he says knowing nothing about pkg internals)? The best part of this as a potential solution is that it is purely a code solution and does not require an individual judgement call on a particular package.
Sorry, I don’t understand. Can you elaborate?
I don’t think we should change General substantially… maybe add a little bit more quality control (see below), but otherwise, I think it should stay the same. I don’t think a complete “free-for-all” is very in line with the much more cohesive community that Julia has compared to other language.
It might make sense to add a free-for-all “testing registry” (like PyPI has as well). It would also make sense to add additional “highly curated” registries. Of course, both of these things have overhead in maintenance, tooling, and setting everything up. So probably something that’s a bit farther down the road.
I’d very much support that.
Package has some documentation
Very difficult to check automatically, but actually very easy to check manually. We’re doing pretty well with that already.
@mbauman I feel like somewhere this thread should be split along the line of “do we want more registries than just General”. We’re diverging a bit from the original topic, even though this is an interesting and important discussion. But it’s sprawling.
Since people can already add a package from a git url, is there actually a need for an additional “anything goes” type registry? Vibe coded/low quality projects would probably still be submitted to the normal Julia registry even if there was a second free-for-all registry, just like they are now.
Yeah, I looked but it’s really hard to detangle in a way that retrospectively leads to two sensible conversations. It’s somewhat natural that a discussion proposing a specific curation rule would end up discussing the pros/cons of more curation more generally and potential alternatives.
For me, personally, the most valuable “curation” to have wouldn’t be about code quality or docs or utility or provenance at all. It’s about the fact that it’s open source with an invitation for collaboration. And even more importantly, with someone continuing to maintain the package post-registration. All of the other things are signals in favor of a well-stewarded package, but it’s the stewarding itself that’s most valuable — and most costly!
In other words, the trouble I have with a vibe-coded package is that it’s (to me) a fairly strong signal that the stewarding is already weak to the point of being problematic. That’s not necessarily true, but it’s a signal.
Regarding using a second registry as a curation signal: I think we could do a lot better instead with a tags system, where packages can be flagged with various tags at registration time, with implementation to surface them in Pkg.
Why? Let’s think about if we wanted to express more than 1 thing, not just curated vs not, but also say permissively licensed vs copyleft. You could imagine wanting curated permissively licensed packages. But registries only really make sense as unions, not intersections, so to express that we would need 4 registries, one for each leaf combination. Totally unscalable.
Instead, I think this metadata would make more sense as a list of string tags attached to each package. Then Pkg could be taught to handle queries with tags (which packages are available matching such and such tags), and the Preferences system could be used to say I always want all my packages to be resolved from the curated tag, for example.
Also, cargo has a tags system, which is generally a good sign for it being a well-implementable feature, but I haven’t looked in detail how it works or what it affords. Potentially it’s for a different use-case than I have in mind here.
That sounds like a great solution!
This could actually also be a good solution for this issue:
Sounds good, and I guess those tags would be version-specific too. A package could change in nature.
If I might chime in, not to bring up old posts, but I think I’ve effectively brought up a similar concern (regarding registries) in the past. Namely, that it’s not very clear to me what packages are “good” or provide “sufficient coverage”. For example in this comment on a mega-post I lamented not really knowing how to find packages that provide near-equivalent functionality of base Matlab, and suggested that it might be nice if one could simply “add” an Organization (such as JuliaMath). We continued the discussion in a separate thread.
I suppose my thought is that, perhaps, we could certify well-managed organizations and have the curated registry essentially be a union of these organizations where each organization maintains its own “gold” and “general” registries, and the curated Julia registry is the union of each org’s gold registry. The process to getting a package into Julia’s curated registry, then, would be to have a certified organization accept your package into their gold registry. In the case that there isn’t a suitable organization or a reputable group (e.g., NVidia, university research lab) wishes to maintain their own organization they could apply for certification, etc.
I don’t understand the law so I can’t say much more than the paragraph in my first comment you quoted. Maybe I could imagine an example? Say a company threatens a lawsuit because it discovers that a General
-registered package plagiarized its proprietary code (stolen, vibe-coded, however). Complete removal and sacrificing reproducibility seem to be the only acceptable move, I don’t see how moving things to an archival registry would help.
Thanks, I get it now, I was missing the context. Of course reproducibility is not guaranteed when legal reasons make it impossible.
To me it seems it might be beneficial to reject “low quality” or “low effort” packages. That would make it easier to justify, because it focuses on the result and not on how the code was created (because maybe we cant know that).
I think writing a package without reviewing the code will propably result in a low quality of the final product, the code might look nice, but maybe it conceptually doesnt make sense or similar. If an llm writes a high quality package however, I would certainly not be against using it.
The other idea might be to focus on effort, ie the effort that a human spent to decelop the package. If the AI code is unreviewed and just works out of the box, it might be so easily recreated, that a package for it is not needed. (I would not want a “is-even” package for julia )
So I would rather focus on effort or quality, in a manner that can be easily reviewed.
One thing, where fully llm generated packages might be possible, would be to translate a solver from fortran or another language in order to make a julia implementation available.