Proposal: ability to add entire organizations (ecosystems) with a single command

As I mentioned in one of my comments over in the “What steps” megathread, it would be nice if there were simple means of adding all “production-level” packages from an organization such as JuliaMath. The reasoning for this is that there are many basic functionalities, such as “special functions” like \Gamma(x) and \mathrm{erf}(x), polynomial evaluations, iterative solvers, fast Fourier transforms, interpolation, root-finding, quadrature, etc., that new users of Julia might expect to be in a standard library (as they are in Matlab) or within a single monolith module like NumPy or SciPy. For these users it might be preferable to tell them to install the JuliaMath organization so that they don’t have to Pkg.add( ["a", "very", "long", "list", "of", "packages", "did", "I", "get", "them", "all"] ).

Then Julia’s documentation could include language to the tune of:

If you’re looking for a Matlab or NumPy like experience and just want a bunch of high-quality, vetted mathematical functionalities, run: Pkg.add_organization( "JuliaMath" ). To learn more about the different packages included with JuliaMath, see the organization’s GitHub page here.

What are peoples’ thoughts regarding something like this? Method name ok? I assume (based on previous replies) that we’d need organizations to do some legwork to support this if we enabled it (e.g., write a wrapper?).

3 Likes

I think this can be very nice, and it’s pretty easy to implement: they’d just have to create a JuliaMath meta-package.

Incidentally, that’s pretty much what I do with QuantumControl.jl. Or at least originally, that package was intended purely as a meta-package around all the packages in the JuliaQuantumControl org. Since then, I decided to also put high-level utility functions directly into the package. But the point remains: it’s pretty easy to make a package that simply pulls in an entire organization.

5 Likes

The core idea of providing a seamless experience is very nice, but being able to download the code is only half the battle: users still have to search around on the Internet to find exactly which packages within that organization to load with using, in order to access the functionality they need. Having to blindly search over all packages (as we currently do for packages from most orgs), and then having to check which results fall within the organization you’ve downloaded, would make this a not-so-seamless experience again.

Solutions to this can come in the form of (1) every organization (or at least the major ones we expect to be most useful) having its own MultiDocumenter-based website that would allow search within that org’s docs, or (2) JuliaHub’s documentation search allowing filtering by organization. In either case, we could give the user this command Pkg.add_organization( "JuliaMath" ), point them to a URL, and give clear instructions on how to use this as if it’s a pre-assembled product like Matlab or NumPy.

Another concern is that this would require that all the packages within an org move in lockstep, or at least something close to it - if any of the packages lag behind in terms of dependency versions, that would hold back every other package when a user installs the whole organization like this.

In one sense, this is a good thing - this would be an incentive to make sure that all packages in an org are given attention, and updated over time (No Package Left Behind).

On the other hand, this would also increase the maintanence burden on the organization members. As I understand it, currently the main point of organizations is to make sure that someone on the Julia side has admin access to a repo even if the original creator was no longer active. If every additional package becomes an added thing to actively maintain - beyond the current level of accepting the occasional PR or fixing major bugs if they come up, and instead having to actively keep up with breaking changes in other packages (so they can use the latest version of them) and change the code - then that becomes a disincentive to add new packages to the organization and potentially increases the maintanence burden by a lot.

It would of course be nice if we could actively maintain every package in every org like this, but Julia already has a big scarcity of developer-hours, so adding to that may not turn out great. A Matlab/NumPy like experience should include being able to use a reasonably up-to-date versions of the code, and we wouldn’t really be able to recommend add_organization as a convenience feature if it commonly installed old and outdated packages.

Why not have an uber-package that depends on all the packages of the org and re-exports all the needed symbols?

2 Likes

This will break on any breaking release of any dependency, if one wants the latest versions. The maintenance burden seems considerable.

Although, perhaps one may simply export the modules, instead of the symbols, which should considerable reduce the breakages.

3 Likes

Why? The Uber-package can just release any combat upper bounds.

Yes, exactly. Then it does not need to place any upper bounds on compat.

So imagine the situation where you develop the App package, which depends on Uber (with compat bound 1), which in turn depends on – and reexports – Lib (with no compat upper bound).

At one point in time, you resolve your environment and get Uber v1.0.0 as a direct dependency, which pulls Lib v1.0.0 as an indirect dependency.
Taking advantage of the fact that Uber reexports Lib, your src/App.jl might look like:

module App
using Uber # get Lib "for free"

do_something() = Lib.foo(1)
end # module App

A few weeks later, Lib removes foo from its interface, and signals this breaking change by releasing the new version as v2.0.0. But under your rule, this isn’t considered to be breaking for Uber: since Uber itself has no actual code, there is nothing that could actually be broken.

The problem is: now you update your App environment, and nothing prevents you from getting Uber v1.0.0 (as a direct dependency) and Lib v2.0.0 (pulled as an indirect dependency via Uber). Boom: your code is broken, since Uber.Lib.foo does not exist any more.

So you can’t really safely use indirect dependencies. And if any package exports any of its own dependencies as part of its own API, then its compat bounds have to be very strict, so that any breaking change in any exported dependency can be considered breaking in the reexporting package.

2 Likes

I have two thoughts:

  1. I was originally just intending this to be able to quickly add all production packages from an organization, but users would still do individual package imports. But this at least gets users a bunch of useful packages, that are precompiled so that when they eventually want to import QuadGK, they don’t have to add & precompile. And simplifies doing a Pkg.add( [<dozens of packages>]).

  2. I do kind of like the idea, though, of saying that Uber could be the equivalent of, say, a monolith package in Python (such as NumPy). In this case, I think the goal should be for these Uber packages to only include high-quality, production, long-term stable subpackages. I mean, are we really expecting that JuliaMath is going to abandon or rename SpecialFunctions? Or that SpecialFunctions would decide to remove or rename gamma()?

Note that in either case I would not include, for example, Interpolations.jl, Bessel.jl, ChangesOfVariables.jl in the Uber package since they aren’t yet “production” (have SemVer versions of 0.X)

2 Likes

@ffevotte Yes, that’s a good point. I would think that this Uber-package idea would be mostly useful only in for REPL usage. I would never use this in a package. But in any case it’s probably a bad idea to register Uber-package like this.

Alternatively, the Uber-package could place compat upper-bounds, capping breaking releases of its dependencies. And the Uber-package would have major new versions for any breaking release of any of its dependencies. Yes, that’s a lot of maintenance… but …

But if indeed the dependencies are very stable, then breaking releases of the Uber-package would be rare. And I would say that only in this situation it would make sense to do this anyway.

1 Like

I’d like to point out that umbrella packages which mostly reexport other packages do exist, e.g. GitHub - JuliaImages/Images.jl: An image library for Julia.

While these are great for interactive use they also have a drawback in that they tend to be used as dependencies in other packages, rather than directly depending on one or a few of the reexported packages. As a result you get unnecessary transitive dependencies, with consequences to precompile/compile/load times and exposure to breaking release transitions.

Agreed. Until last year, not all packages included in R’s Tidyverse uberpackage were at least 1.x and the R community didn’t have much of a problem with it.

It would be nice to borrow the syntax of pip install and be able to install those uberpackages with extra packages.

I think if you reexport something, you are still responsible for its breakage. So in your example, you would need to set a compat bound and release a breaking version if you update that to include a breaking release of something you export. It would not be so important if there is a delay, because users of the uberpackage are probably looking more for convenience and stability rather than having the most up-to-date version of every package at all times.

1 Like

Alternatively, one could provide a Manifest.toml and a Project.toml with the intended and pinned package configuration (tested for compatibility). Then provide a start script which downloads them into the current working directory, and use pkg"instantiate". If you want all packages to be loaded, simply load them in the start script.

One thing to take care with this approach is to not overwrite existing *.toml files, but to update them (if it applies) such that other packages manually added by a user remain therein.

For the user everything is a one-liner then, namely, including the start script.

The CuratedSystemImages idea avoids some of the problems mentioned here, and in my opinion is a better way to go about solving the same fundamental issue.

1 Like

Here’s an ugly little monstrosity that you can put in your startup.jl file:

import Pkg, JSON3
add_org(org::AbstractString) = add_org(String(org))
function add_org(org::String)
    ctx = Pkg.Types.Context()
    Pkg.Registry.download_default_registries(ctx.io)
    Pkg.Operations.update_registries(ctx)
    with_uuid(name::String) = (name, Pkg.Types.registered_uuid(ctx.registries, name))
    repos = [x for x in with_uuid.(first.(splitext.(filter!(endswith(".jl"), basename.([pkg.html_url for pkg in JSON3.read(download("https://api.github.com/orgs/$org/repos?type=all")) if !pkg.archived]))))) if x[2] isa Base.UUID]
    Pkg.add(first.(repos))
    function create_meta_package(org::String, packages::Vector)
        code = "module $org\nusing Reexport\n@reexport using $(join(first.(packages), ", "))\nend\n"
        deps = join((first(x) * " = \"" * string(x[2]) * "\"" for x in packages), "\n")
        project = "\nname = \"$(org)\"\nuuid = \"$(Base.UUID(0x90e0578367142669d09f4608e371542e + hash(org)))\"\n\n[deps]\nreexport = \"189a3867-3050-52da-a836-e630ba90ab69\"\n$deps\n"
        path = joinpath(first(DEPOT_PATH), "dev", org)
        check(path, content) = isfile(path) && read(path, String) == content
        check(joinpath(path, "src", "$org.jl"), code) && check(joinpath(path, "Project.toml"), project) && return path
        isdir(path) && (path *= "_h7BmQ7oFrHCUN9UjUYVxxUeO2TBgJp")
        check(joinpath(path, "src", "$org.jl"), code) && check(joinpath(path, "Project.toml"), project) && return path
        println("Creating $org.jl metapackage in $path")
        mkpath(joinpath(path, "src"))
        open(f -> write(f, code), joinpath(path, "src", "$org.jl"), "w")
        open(f -> write(f, project), joinpath(path, "Project.toml"), "w")
        path
    end
    Pkg.develop(path=create_meta_package(org, repos))
end

Usage:

(@v1.10) pkg> activate --temp
  Activating new project at `/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_hD1R8R`

julia> add_org("JuliaMath")
    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
    Updating `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_hD1R8R/Project.toml`
  [49dc2e85] + Calculus v0.5.1
  [861a8166] + Combinatorics v1.0.2
  [667455a9] + Cubature v1.5.1
[...]

(jl_hD1R8R) pkg> st
Status `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_hD1R8R/Project.toml`
  [49dc2e85] Calculus v0.5.1
  [861a8166] Combinatorics v1.0.2
  [667455a9] Cubature v1.5.1
  [55939f99] DecFP v1.3.2
  [abce61dc] Decimals v0.4.1
  [53c48c17] FixedPointNumbers v0.8.4
  [b21f74c0] FunctionZeros v0.2.0
  [92c85e6c] GSL v1.0.1
  [4a05ff16] Hadamard v1.5.0
  [c8ce9da6] IntelVectorMath v0.5.1
  [a98d9a8b] Interpolations v0.14.7
  [8197267c] IntervalSets v0.7.7
  [e24f45a5] InverseLaplace v0.3.2
  [90e05783] JuliaMath v0.0.0 `~/.julia/dev/JuliaMath`
  [984bce1d] LambertW v0.4.6
  [9c257583] MittagLeffler v0.2.0
  [efe261a4] NFFT v0.13.3
  [77ba4419] NaNMath v1.0.2
  [f27b6e38] Polynomials v4.0.2
  [27ebfcd6] Primes v0.5.4
⌃ [2576dda1] RandomMatrices v0.5.0
⌅ [f2b01f46] Roots v1.2.0
  [ed01d8cd] Sobol v1.5.0
⌅ [276daf66] SpecialFunctions v0.10.3
  [c544e3c2] Tau v2.0.0
Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`

julia> gamma
ERROR: UndefVarError: `gamma` not defined

julia> using SpecialFunctions

julia> gamma
gamma (generic function with 9 methods)

julia> isprime
ERROR: UndefVarError: `isprime` not defined

julia> using JuliaMath

julia> isprime
isprime (generic function with 4 methods)
4 Likes

I can see the value of being able to automatically an entire org if someone knows they will need to use a variety of packages. But I think there is a simpler (in theory) solution. If someone attempts to use a packge that isn’t installed from the REPL (or Pluto notebooks), they are already prompted on whether or not to install it. So if I’m a new user following a ScIML tutorial and type using Symbolics into the REPL, I will be asked y/n to install it if I haven’t already. But I would be asked for every uninstalled package, which may not be ideal. Also, if I use the VS Code SHIFT+ENTER keybinding to run the using Symbolics line, it errors instead of asking for the install.

What if there was a Pkg setting to just automatically all packages a script is using? The setting could then be permanent, or just turned on for a session. This would not add a bunch of unused packages and would not require metapackages. And even if an entire orgs packages are installed, the user would still need to figure out which one contains the functionality they need, at which point they could just add that one package.

2 Likes

This could be a good VSCode popup if not in Pkg. One thing I’ve loved about using R with RStudio in the past is that, when I open a new script, it gives me an option at the top to install all loaded libraries - very handy.

The context for this question was the mega-thread of how to increase popularity of Julia (i.e., attract and keep new users). In this context it’s more trying to give a similar user experience to, say, Python with NumPy and SciPy or Matlab with its base library + toolboxes. In these cases the users may not know which specific packages they want but they want to have a broad set of capabilities “out of the box”. I personally think that using using is a horrible habit to teach users, and for many of the new users we’re targeting Julia would be their first foray into programming – thus a reliance on VS Code, VS Code extensions, keybindings, etc., is throwing complexity upon complexity.

What I think the goal should be, should be for new / infrequent users of Julia to have an “easy-to-remember” idea in their head that "if I’m doing math in Julia I should add JuliaMath, bioinformatics → BioJulia, differential equations / ML → SciML, etc. And then as they grow into Julia they can learn more about “use project functionality only add the bare minimum packages from SciML to my project on an as needed basis”… effectively weaning them from “new user mode” into “advanced user mode”.

4 Likes

First, I’ll say, I’m not necessarily against the add Ecosystem idea, I’m more just thinking about it in terms of “what is the simplest way to get the same user experience”. I can see conceptually how being able to to add an entire github org’s packages at once is easy, but it doesn’t necessarily solve the problem. I have SciPy installed, but I can’t tell you for sure which module to use for 2D interpolations even though I just used within the last couple months (something like from scipy.interpolations import interp2d or some such probably). So whether its one mega package (NumPy, SciPy) or adding an entire org, the user still has to do the same thing: figure out which module they need. If a user asks, “how do I create a Boltzmann distribution in Julia?” having an entire org’s packages doesn’t answer the question even if some of the packages are installed.

On the other hand, brand new users are likely to be following tutorials for a specific thing. If the REPL prompts for installation (the current status quo) or auto installs packages, then that seems to be the easier solution since Pkg and using are already most of the way there. It would also have fewer ramifications for package compatibility, at least as I understand the compat bounds issues mentioned above.

1 Like