Yes, I’ve thought before that optional “glue” code should go in separate packages. Linux distros have long done this, for example with packages like python-foo
for python bindings to library foo. Then all that’s needed is a convenience feature that automatically loads A_B_Glue
if A
and B
are both loaded.
I’d agree with you if there wouldn’t be the issue, that it’s not so easy to merge interfaces/APIs across modules (export non-merge). So if Plots exports a Plots.plot the plot function cannot be used/exported anywhere else. You’re right, with a clear dependency tree this could be avoided, but then the clarification of the tree is (again) on the package developers.
That’s also an important issue, but I think it’s largely separate. It would come up even without optional dependencies.
Whats is interesting to note is the question if “Package == Module”. Because another model would be to have:
using Plots
using Plots.PyPlot
This would allow to selectively enable modules.
I think in past discussions it’s been broadly agreed that something like conditional modules are the right long-term approach, and things like versioning can work with that. The only issue is that designing it completely is a huge project, when something much simpler can do a lot to solve the immediate problem.
The perfect is the enemy of the good here; it will be a shame if we have to have another three years of relying on hacks because Base will only accept a solution that satisfies absolutely every need up front.
DifferentialEquations.jl/JuliaDiffEq is a prime example here since it started with a Plots.jl like optional dependency structure and switched to a modular library structure. There are ups and downs, but at this point I very much agree that developing everything with smaller packages and glue packages + putting it all together in a large metapackage is the way to go. Not only are there no precompilation problems, but then you can also separate your tests and lower CI loads by doing so. And you can more easily “isolate a problem”: version bounding a specific bad library update. Lastly, contributors seem to find it much easier to dive into little bites rather than the whole library.
That said, test breakage can be an issue. You can end up with chicken-and-egg situations. For example, say you developed a feature in a specific package, but it has grown to be something useful elsewhere. You may want to move an abstract type from a leaf package to a core/base package. If both packages are not on master, tests will fail in this situation. Since our package setup tests with dependencies on release, this ends up with some situations where tests are guaranteed to fail until tags go through or you locally check masters together. Also, changes to a Base/core package can require changes in the leafs. Adding a new type parameter can cause many leafs to need an upgrade.
A more common problem for me is that I like to have a lot of integration tests. The Base package has test dependencies on some of the leaf packages because they are used to make sure everything works still. But this leads to a chicken and egg test problem with larger Base changes (though it’s just a test problem, and not actually failing). I could see Plots having this kind of issue, since you’d probably want to test PyPlotPlots with Plots.jl to make sure a bunch of features work, but now changing Plots can require a PyPlotPlots update for tests to pass.
The issue here is that these things lead to upper bounds when doing updates, and Pkg cannot handle those well. That’s the root of this issue:
That said, these are mostly tooling problems. And actually many (most?) software ecosystems probably wouldn’t have these kinds of problems. For example, Bio.jl is already very modular in its own package, and so it can split and probably not have many tests overlap. But if Pkg and the CI setups was made to more easily handle this kind of changes, I think that this is the clear preferred path. Putting all of the packages together in an org also makes it easy to share privileges and manage them as a group.
So after commenting for a long time asking for condition modules, I don’t really think they are as necessary since Julia does so well with glue packages. I just think the testing and tagging structure needs to be updated to accommodate working like this in more complicated setups.
Thanks @ChrisRackauckas, that is valuable experience.
For my part, I’m not so much interested in satisfying every need, rather just getting the core abstractions “right”. For example, if we had separate module and conditional-module features, that could create ongoing maintenance burden where we have to consider both features when making any changes. I want the core system to be as simple as possible, so that new needs and use-cases can be addressed with tooling.
I think the problem with putting shim code into a different package has two problems. The first one is usability e.g. the user needs to explicitly install the shim, since non of the base packages will depend on it, and the second is that it can lead to combinatorial explosion in the number of small packages. IIRC there was a recent debate how packages can extend the Juno show/display methods without depending on Juno themselves.
I intentionally left version resolution for optional packages out for now, since that a squarely falls into the realm of Pkg3. I was thinking more along the line of optional/conditional modules and making them play nice with precompilation. As Tony noted for optional packages one could build ontop of the static method to also take package version into account.
@dependson Pkgname v"0.1" begin
...
end
Of course once you actually start using versioned dependencies like this, you would want to invalidate your cache if that version becomes available/unavailable. That makes the invalidation and consistency logic a lot more complicated and depended on Pkg.
The error with WeakRefStrings looks like an existing bug https://github.com/JuliaLang/julia/issues/21266 where precompilation tries to load everything that gets stored in the ji file even if it’s not a direct dependency and might no longer be present. That should be fixable independently of conditional dependencies, by changing the order of when dependencies are checked for staleness vs attempted to be loaded when using a precompiled package.
Yes, I think there’s a big difference between “within-ecosystem conditional dependences” and “outside conditional dependencies”. Plots, DiffEq, Bio : these have a lot of components which together form the package, but most users will probably only use one small piece. So the modular package + metapackage approach is great for this kind of conditional dependence because it’s about making leaner developer targets (point users to the metapackage because it’s easier to use, but developers can pull in just the functionality they need).
Juno.jl, RecipesBase.jl, etc. all are solving a different problem where you want to plug into some other ecosystem with as few dependencies as possible. This does create oddly small glue packages like UnitfulPlots.jl, which ruins the utility since they are so small that most users probably don’t know they exist. If your package ecosystem is large enough like DiffEq, then you just lump all of these into the metapackage and it’s invisible to users that they got another small dependency. But for single packages like PlotlyJS with Juno.jl where it’s a decision to add a dependency or make a glue package, the glue package option is obscure to users.
These “outside conditional dependencies” are good candidates for conditional dependencies of this form while “within-ecosystem conditional dependencies” are better approached through small packages + Base packages, and so I think you can focus on only solving one of these cases. Right now, most setups seem to choose one method for handling both of these issues, which ends up making things a little insane.
In that sense, I see approach two (Static conditional dep) as being the clear better form to tackle this issue. Essentially, it would “automate” the glue package usage. The way I would read this logic in psudocode is “if this user is a user of Juno, then add progressbar support” or “if this user is a user of Plots, add recipes”. I would want this to be done at compile-time, with the compile-time check being essentially “is this person a user of ___?”.
I definitely vote for the second, but it needs to throw a clear warning. Essentially, just because some package hasn’t updated their Juno.jl progress meter bindings and threw an upper bound doesn’t mean that there should be an error for Juno as a whole. Instead, the package’s link to the Juno verse should be severed (with a clear error thrown to the user. I can forsee a lot of bug reports which are difficult to understand, only to find out that the optional dependency mechanism was just disabling the code).
The issue usually isn’t able enabling modules, it’s usually about selectively installing dependencies. Usually binary dependencies are the issue. If you’re able to install everything, at that point you don’t really care.
Let me give some DiffEq-verse examples. JLD doesn’t always build right on HPCs, since some of them use ancient CentOS versions and may have old HDF5 modules which also cause difficulties. This will throw a build error and Pkg will end up not playing nicely with you. To get around this, you want to be in a position where Pkg never adds/tries to build JLD at all. Some plotting libraries have these difficulties too. So you want to do is grab the minimum setup of the package (usually a Julia-only version) which will work on the cluster, while Pkg is trying to throw everything else at you and error, even though you’re looking to ignore all of the JLD stuff anyways. This is a problem because one dependency that cannot build/precompile means that the whole package won’t work (without fussing with the source) given the current setup.
This is another reason why I think of it as a “if the user has X installed, add functions, otherwise delete this part”. REQUIRE should just check if it is installed, and if it is, use it and add the functionality, precompiled and all. What we’re trying to avoid is crippling large packages which offer a lot, but cannot be used because some small dependency is causing a problem (especially when then extra functionality isn’t something used by the majority of users). Other people may care more about the “purity in the number of true dependencies”, but I just care about this because required dependencies can cause the entire package to fail to precompile.
That was my point /question: May it be possible to have within a package a submodule that is only invoked when explicitly requested? Then the package might still be functional despite some dependencies not being there/installed. The point is that IMHO it is much easier to group these things into modules instead of optional dependencies on a function level.
In fact, @tbreloff has initiated a reorganization of Plots along these lines (it is on the reorg
branch of the repo), and it already works with a helper package PlotsGR as a proof-of-concept (the PlotsGR package is loaded when you activate the backend in Plots, so there is still only the one manual using
call). The purpose is exactly to ameliorate the precompilation issues. It’s been a while since it saw develompent, but AFAIK it is still the plan to transition to this sometime during julia-0.6.
@tobias.knopp: That was my point /question: May it be possible to have within a package a submodule that is only invoked when explicitly requested? […]
I have been thinking of a similar idea for InspectDR.jl (plotting “backend”):
- Needs Cario.jl to draw base plots.
- Optionally requires Gtk.jl to provide either Gtk plot widget, and/or full plot window/application.
I have had issues on JuliaBox where, because the graphical Gtk.jl layer cannot be installed, InspectDR.jl cannot even fall back to using only Cairo.jl for inline plots: Build issue with JuliaBox · Issue #279 · JuliaGraphics/Gtk.jl · GitHub
In this particular case, I would not mind separating the two/three layers into separate modules (at least in theory).
In practice, I found that the biggest hurdle to separating out submodules is that each module has to be its own Git repository (AFAIK). Though the multiple-module solution forces one to write cleaner, more modular/layered code: It is quite annoying to keep 3-4+ tightly coupled modules in sync across multiple Git repositories.
Questions
- Is there a reason that each module has to be its own Git repository?
- Can we not provide multi-module “packages” in Julia?
I know this does not solve the dependency issue directly, but maybe submodules could somehow control their own “REQUIRE” file - instead of relying on a single file for the entire (multi-module) “package”.
I absolutely agree we should have some kind of multi-module packages, and/or multi-package repos. Needing to make a new git repo is too high a barrier for introducing more modules.
I totally agree with @MA_Laforge and I would love to see this. In particular if it allows for separated precompilation of the individual modules in one package. Is this already planned for Pkg3?
Not just this, but biggest feature request: allow separation of tests for different modules, so you can have some tests that only run when specific modules are modified.
Thanks to all here for all the interesting suggestions to improve the lazy loading framework of Plots, which does cause some issues; but is also the reason that Plots can so successfully create a shared infrastructure for plotting in the diverse plotting landscape of julia. If you have any specific advice on Plots, we’ve opened an issue to discuss this further here: https://github.com/JuliaPlots/Plots.jl/issues/918 .
Thanks!