This was discussed previously on another thread, but I can’t locate where the discussion took place.
The current package manager was getting very slow due to the large number of packages. So if you have a bunch of related features, it would make sense to make a single package for that, so as not to make the number of packages unnecessarily huge.
If you have a bunch of coherent functionality that depends on each other to provide an API, then make a single package out of that.
You probably don’t need to split it up into multiple packages unless you have a particularly huge set of functionality. If you really have so much related functionality, like DifferentialEquations.jl, then it makes sense to split it into separate packages, since many of those components can live independently and offer a full set of related functionality that belong together, yet can be independent of the entire DifferentialEquations ecosystem.
But if you have, say 10 related functions that help accomplish some task in a specific area, then it would make sense to keep them together. 10 functions is not a critical mass of functionality. However, if your 10 functions are completely unrelated to each other, then it doesn’t make sense to package them together.
If your functionality consists of a single function with only say 10 lines of code, it might not even make sense to make it into a registered package at all, since it is not really an API, then it might make more sense to just write a blog post or a discussion post about it, or make a Jupyter notebook.
There probably isn’t a maximum size for a package, but maybe a good guideline is the Unix philosophy:
However, I would prefer not seeing lots of tiny packages with single functions in them.
This whole NPM disaster thing is not properly following the UNIX philosophy.
Consider this quote from Einstein:
Make everything as simple as possible, but not simpler.
Not everything is meant to be a package, some snippets of code are better treated as examples for blogs, discussion threads, or Jupyter notebooks, or gists.
So it’s a fine balance point. Use fine judgement and wisdom.
As I said, this has been discussed before, but it’s buried in some other thread somewhere.
Making everything as simple as possible, but not simpler: for example, if you have a bunch or related functions that work together, then it is simpler to make it into a single package. If you have a bunch of related functionality that is very complex and some of it can live independently, then it is simpler to split it up.
I’ll add some perspective as an active maintainer of the HTTP.jl package. In the Julia early days (circa 2012), there was a Hacker School project to put basic web functionality together in the form of the HttpServer, HttpCommon, Requests, and URIParser packages. Due to the transient/short-lived nature of Hacker School, these packages came out w/ an initial “bang” of functionality and usefulness, and then were hardly touched for years. Functionality was slowly duplicated across these packages as one-off contributors tried to fix a certain issue. Duplicate issues were also filed across these repos as users had a hard time knowing exactly which package was the exact cause of their issue.
HTTP.jl was born w/ the goal of modernizing the foundational webstack code in Julia and providing a cleaner/easier path forward in terms of maintenance. It began literally by merging the git histories/repos of the mentioned packages above and consolidation/enhancements began. In this case, merging the packages has led to an overall cleaner code organization, great reduction in duplicate “utils” functions, and an easier “one-stop-shop” for users when they need web functionality or have web-related issues. It’s also much easier to maintain as there is a single package’s tests to be run w/ enhancements/improvements, as well as a single package to tag/release.
Now, there are obviously pieces of HTTP.jl that would be safe/nice to split off into dedicated packages: the HTTP.URIs module, for example, has fairly mature code, straightforward interface, and you would expect very little in terms of needed enhancements or issues. Also w/ the HTTP.Nitrogen module, which provides all the server functionality; it’s not quite as tightly coupled w/ the rest of the package and there are plenty of user use-cases that involve making requests, but not needing server functionality.
Anyway, for the moment, this has been a great solution that has kept basic web functionality active and maintained for Julia, even if it goes against traditional “unix” philosophy.
It is impossible to draw a clear line, but I think “abstraction level” is a key factor to determine whether a code should be a package or not. Abstraction is arguably the most important concept in any programming language; it makes concrete procedure abstract and frees users from details.
If your package abstracts some procedure at a high level, I think it is worth packaging and registering it as a public package. For instance, let’s consider an imaginary package, Sorting.jl, which offers a sort function to sort elements in an array (of course, we know the sort function in Base, but here we assume there was no such function in Base). I think this is a kind of high-level abstraction because there are so many sorting algorithms and there are various ways to implement an algorithm. However, once we abstract it as the Sorting.jl package, we don’t need to care about its internals and we can leverage our productivity. On the opposite extreme, if we create SumOfSecondAndThirdElementsOfAnArray.jl, the implementation would be straightforward and there is no abstraction at all.
In practice, my lower bound for packaging something is that
I use it in multiple places, and
it benefits from unit testing and CI.
So besides code reuse, a major benefit of packages for me is that I can set up CI for them. The upper bound (breaking up code into smaller packages) is even more fuzzy; it should make sense conceptually and provide for clean APIs.
Julia packages are really lightweight. It takes a few minutes to create one (with CI and code coverage tools set up automatically), which makes fixed cost trivial. So a lot of small packages are expected. I think this is good, even for one-liners that one could replace with equivalent code.
It is not that I am saving about 20 characters, but that the intent is communicated more clearly. Totally fine as a small package which does one thing and does it well. If it was buried in SomeCollectionofUtilities.jl I may not bother importing all of those. I like modularity.
One of those often cited quotes for which there seems to be no direct evidence. Could be that it was told verbally, though it is also speculated that it’s a paraphrase of
It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.
I really think this recent trend of registering one-function-packages is harmful for the Julia package ecosystem in the long run. Right now, it seems not too uncommon for a registered packages (that actually provide valuable functionality) to also provide 2-3 tiny spin-off packages.
These spin-off package often only contain a single utility function (which typically is a Base function specialized on some arguments) and has almost no use, outside the original package it was created in.
This will make finding packages that actually does valuable stuff harder, it will bloat dependency lists and making it more unclear what a package depends on, making diff lists when upgrading packages larger, create more overhead when it comes to reviewing tagging packages, CI, make it harder for new contributors since they have to try get an overview of the whole dependency chain and how everything fits together etc etc.
Keep your utility functions inside your packages. Only split out stuff into its own package if this provides a significant value on its own and will independently be developed. Don’t split something out because you think that it will be a large independent thing in the future, wait until it has actually happened from developments inside the main package. That is my opinion.
I think this is the right approach, but do you think that the problem you describe is happening in practice? With the registered packages I use, I did not see this trend, almost the opposite. For example, almost embarrassingly, sometimes I just use Lazy.jl for @forward (and I am fine with that, no need for a separate package).
From reviewing METADATA? Yes, I see quite a few packages on the borderline of too small so I leave someone else to make the decision of what to do. I think this happens a lot. It’s just, these are the packages people don’t tend to use…
IMO the problem is that Pluck isn’t really a package, its one, one-line, utility function that you define locally if you need it. If you need more serious sampling you need to resort to more comprehensive packages like StatsBase anyway. Also loading code shouldn’t be a problem, we all happily load Base everytime we start julia, and most users don’t use everything in there either.
The trade-off here seems to be between computer time (loading that package) and programmer time (single package vs breaking it up to smaller packages, but at the same time keeping them in sync and aiming for a well-designed API). Given that
julia> @time using StatsBase
0.090097 seconds (36.51 k allocations: 2.486 MiB, 63.90% gc time)
wasting programmer time instead of computer time on this may not be justified.
A single random item is of course not challenging. Multiple random items, with or without replacement, possibly with weights, are trickier.