A few of us (@piever, @sdanisch) have been discussing the issue of optional dependencies again, unsatisfied w/ current solutions (having to invert dependency chains, Requires.jl performance issues + clunkiness, etc.). One idea that has recently been brought up is the idea of a SharedFunctions.jl package:
it would be restricted to empty generic function definitions (e.g. function foo end); this is important for keeping the package small & performant, as well as a very low-risk dependency
each generic function would have a sole âownerâ package, in charge of providing documentation/maintenance, as well as being the sole package allowed to provide generic fallback definitions of the generic shared function (i.e. allowed piracy)
once a function was âregisteredâ in SharedFunctions.jl, it wouldnât be allowed to be removed to prevent any form of breakage (possibly this could be relaxed if it could be proven that all extenders of the function had already removed their extensions of it).
A group of packages could obviously coordinate such an âinterfaceâ package and some have (IteratorInterfaceExtensions, StatsBase to some extent, etc.), but I feel there would be value in having a community-driven âsolutionâ w/ well-documented practices, in particular for scenarios when there are only a few generic functions that need to be shared by a group of packages (obviously if thereâs a situation where a group of packages needed to share/overload hundreds/thousands of functions, it would hurt perf of SharedFunctions.jl, so theyâd be better off making their own interface package). I feel like it could provide a nice solution, shared place people could go, make a PR for a new shared function, and not have to worry about registering their own interface package or coordinating things.
Thoughts? Good idea? Unexpected downsides Iâm not considering? I feel like Iâd like to get the ball rolling on something like this.
I think itâs a really good idea. I have considered this somewhat myself, and have on occasion implemented my own very limited versions of this.
A major comment that comes to mind is that I have a feeling that for this to see widespread use it would need to be somewhat domain specific. For example, there might be a JuliaOpt/OptExtensions.jl, JuliaStats/StatsExtensions.jl, JuliaData/DataExtensions.jl. Thatâs less for technical reasons than it is for community organization reasons. Iâm just having a hard time imagining what the process would look like for deciding which functions to extend or export, or how to name functions if the function extensions repo was too general. It would likely be much easier to organize within specific sub-domains, since in many of those cases there are already some very widely used functions from the âcoreâ packages.
If you are not defining an interface in any generic sense, isnât this just an opt-in shared global namespace for functions? (BTW, I have no problem with that whatsoever, as it could make things much easier for package interoperability.) Moreover, if it is just that then I donât see the point in having a whole bunch of domain specific ones. Domains are not that cleanly separated.
I do believe that this issue ought to be solved in base, though. There should be a way to define a LightGraphs.cartesian_product method without importing LightGraphs.
Yes, this is essentially an opt-in global namespace for shared functions. I also agree that domains tend to not be that cleanly separated, and because of the overhead and lack of strict process, these kind of domain-specific packages havenât happened/succeeded very much, which is my main motivation in proposing such a package that would have strict guidelines to ensure a clean solution to packages sharing methods w/o having to directly depend on each other.
In my mind, it would be a straightforward process: hey, my package A and this other package B have a shared function that does the same thing, but for our own types respectively. It doesnât really make sense for my package to depend on B, or for B to depend on A, so itâs awkward, because how else do we get these functions merged. With the solution proposed here, youâd make a PR to SharedFunctions.jl with function foo end, along w/ docs around the shared meaning/concept of the function and designate one of the packages as âownerâ, letâs say package A in this case. Then package A & B take a dependency on SharedFunctions, change their definitions to function SharedFunctions.foo(...) and go along their way. Package A would also have the option of defining a generic fallback definition for foo if applicable.
One additional idea would be to include submodules within SharedFunctions that generic function stubs could live in; possible domain-related submodules like Data, Stats, etc. Or perhaps submodules of the name of the owning package?
Perhaps this approach is more permissive than I was initially imagining. I was thinking of myself writing some functions and not really knowing which it would be good to extend, so I either donât extend much of anything or I just go nuts and extend everything. As I think of it more, maybe this just wouldnât be a problem. In the former case, of thereâs something you really ought to be extending youâll probably know it, in the latter case, it probably just doesnât matter as long as you always use types from your package.
So perhaps the usual no-type-piracy rule is sufficient.
I think the name of the owning package is a really good idea for organizing this package into submodules, and I generally like this model of optional dependencies much better than anything used in the Julia ecosystem currently. Big +1 from me.
As far as names, I think something a bit more descriptive of the problem being solved would be better: FunctionStubs, OptionalFunctions, etc.
I think as a first prototype, a single package for this makes the most sense, but maybe as we see how things naturally fall out, it might make sense to split off into domain-specific packages after a major version bump of this package.
Have you thought about what using those function stubs will look like in packages? Will you have to do a check of the existence of the method youâre interested in each time before calling a function from SharedFunctions? Or would any module using a SharedFunctions function be required to define a fallback for their specific use-case?
As a note on performance, I just tried generating a package w/ 1, 100, 1_000, 10_000, and 100_000 generic function stubs to see what it does to precompile/loading time. Up to 1_000 generic function stubs, thereâs almost no difference in timing (~0.028s to load the precompiled package). For 10_000 generic function stubs, it started to increase linearly, (~.2s for 10_000, and ~20s for 100_000). Which is promising since, in my mind at least, the idea for this package would be a few dozen functions at most. But even up to 1_000, thereâs no performance impact.
So the alternative proposal is that each package that wants to make itself optionally loadable creates its own dependency-less package where its function stubs are defined. e.g. Gadfly would depend on GadflyStubs, DataFrames would depend on DataFramesStubs, etc. I think if we had good tooling for developing multiple packages in the same git repo and releasing both as a single, automatic step, this could be a lot cleaner and nicer than a single global package.
Something like a git repo where each top-level folder is a Julia package with full Project.toml, src, test and everything with the single shared thing being the version number of all packages is defined in the top-level folder.
I liked the name of SharedFunctions.jl since, IMO, itâs descriptive of whatâs going on: hereâs a function foo that neither package can cleanly âownâ from a hard dependency perspective, yet both desire to âshareâ the generic function definition and have their methods merged.
HmmmâŚthis could maybe work, but sounds like a lot of boilerplate, not only to generate all these files and work out their hierarchy/loading, but as an âimplementationâ package, I could imagine then having dozens of *Stubs dependencies which could get really annoying to maintain. We also have the consistency problem: how do I know all these packages are keeping their *Stubs files clean, or not removing generic function stubs arbitrarily (causing breakage).
I have proposed exactly this as an alternative. Basically, we want to be able to have both
module A
f(x::AType) = 0
end
module B
f(x::BType) = 1
end
but tell the system âif both of these modules are loaded, then A.f and B.f are the same function and should be mergedâ. That way you donât have to load a package to extend its functions. Itâs also much easier than trying to automatically merge all same-named functions, since the package authors have opted in to getting errors if methods are duplicated or ambiguous.
To me, the idea of SharedFunctions.jl is to just throw away namespaces entirely. And itâs true, having a single global namespace can be very convenient. You donât have to think about where things come from. And yes, it can be hard to draw lines between different domains. But here, weâd be drawing a line between those who think there should be a single global namespace, and those who think there shouldnât. The SharedFunctions meaning for a function f would just be whatever meaning is preferred by the first to make a PR. And SharedFunctions.jl itself would be a random list of unrelated names. I donât see it making sense to maintain such a list together.
One possible way forward could be to spec a (tiny) subset of the julia language: header-modules and header-packages.
Header modules would be included by a new import_header_module keyword.
The only permissible subset of header-modules would be:
import_header something_upstream
function stubs: function fun_name end;
Abstract type declarations: abstract type foo{T1,T2} <: bar end
For the sake of convenience: Definitions of constants. Either const SOME_CONST = .... The set of admissible types of constant definitions would be severely restricted (strings, integers, floats, symbols, âŚ).
Source-code comments.
Especially no macros, no initialization code, no executable code.
The goal of these restrictions would be that people could import_header malicious files without causing direct problems. Hence, we would need an ironclad verifier / normalizer that runs on import_header (verify that it is well-formed according to some extremely restrictive grammar before permitting the julia parser to touch it).
With such an infrastructure in place, optional dependencies could be managed by having non-optional dependencies on header packages (that could possibly live in a different registry). These could be thrown around like candy, and could even be implicitly installed, with some size limits (installing and importing a header cannot compromise a machine, nor can it compromise code that uses it for extending methods).
edit: permitting function aliases is a bad idea, because a change of header could turn a benign unused function extension from an optional reverse dependency into a compromise (user piratically extends some_header.some_fun, and updated header version has e.g. const some_fun = Base.other_fun leading to hilarity).
I think common base packages like AbstractLattices are great. Those are fully compatible with the notion of namespaces, since it lets you say e.g. âwedge in the lattice senseâ. I would contrast that with having a single base package for everything, which doesnât make as much sense to me.
Yeah, it seems the consensus is building around more limited scope âXBaseâ packages that still give a notion of namespace for the functions being shared, which is fine. The only issue I have there is the overhead and inconsistency Iâve personally noticed; these XBase packages getting too heavy, or package developers reaching for solutions like Requires.jl because of the onerous process of going through the whole flow of creating a new package. Itâs much easier to slap on a new dependency and move on developing, rather than having to abstract a few stub functions out to a new XBase package, setup a new Project.toml, CI, tests, get it registered, wait several days, etc, etc.
I agree â I donât think adding XBase packages handles every case. Those only make sense when multiple packages want to share a common vocabulary in certain ways. There are other cases where you just want to extend a function that happens to exist in another package, without factoring out an XBase. Of course one of the motivating examples here was DataFrames extending StatsBase.describe. We donât want to factor out StatsBaseBase, and DataFrames is not a âstatsâ package in the sense that it wants to define lots of statistical methods. For those cases I think âmanual mergingâ would be the solution.
Is it something already implementable in the âuser-landâ without changing Julia itself? Revise already seems to be doing a lot of magics to edit the method table.