Proposal for SharedFunctions.jl package for optional dependency management

quinnj · April 25, 2019, 5:18pm

A few of us (@piever, @sdanisch) have been discussing the issue of optional dependencies again, unsatisfied w/ current solutions (having to invert dependency chains, Requires.jl performance issues + clunkiness, etc.). One idea that has recently been brought up is the idea of a SharedFunctions.jl package:

it would be restricted to empty generic function definitions (e.g. function foo end); this is important for keeping the package small & performant, as well as a very low-risk dependency
each generic function would have a sole “owner” package, in charge of providing documentation/maintenance, as well as being the sole package allowed to provide generic fallback definitions of the generic shared function (i.e. allowed piracy)
once a function was “registered” in SharedFunctions.jl, it wouldn’t be allowed to be removed to prevent any form of breakage (possibly this could be relaxed if it could be proven that all extenders of the function had already removed their extensions of it).

A group of packages could obviously coordinate such an “interface” package and some have (IteratorInterfaceExtensions, StatsBase to some extent, etc.), but I feel there would be value in having a community-driven “solution” w/ well-documented practices, in particular for scenarios when there are only a few generic functions that need to be shared by a group of packages (obviously if there’s a situation where a group of packages needed to share/overload hundreds/thousands of functions, it would hurt perf of SharedFunctions.jl, so they’d be better off making their own interface package). I feel like it could provide a nice solution, shared place people could go, make a PR for a new shared function, and not have to worry about registering their own interface package or coordinating things.

Thoughts? Good idea? Unexpected downsides I’m not considering? I feel like I’d like to get the ball rolling on something like this.

ExpandingMan · April 25, 2019, 5:26pm

I think it’s a really good idea. I have considered this somewhat myself, and have on occasion implemented my own very limited versions of this.

A major comment that comes to mind is that I have a feeling that for this to see widespread use it would need to be somewhat domain specific. For example, there might be a JuliaOpt/OptExtensions.jl, JuliaStats/StatsExtensions.jl, JuliaData/DataExtensions.jl. That’s less for technical reasons than it is for community organization reasons. I’m just having a hard time imagining what the process would look like for deciding which functions to extend or export, or how to name functions if the function extensions repo was too general. It would likely be much easier to organize within specific sub-domains, since in many of those cases there are already some very widely used functions from the “core” packages.

jlperla · April 25, 2019, 6:02pm

If you are not defining an interface in any generic sense, isn’t this just an opt-in shared global namespace for functions? (BTW, I have no problem with that whatsoever, as it could make things much easier for package interoperability.) Moreover, if it is just that then I don’t see the point in having a whole bunch of domain specific ones. Domains are not that cleanly separated.

ExpandingMan · April 25, 2019, 6:06pm

Maybe I’m wrong, like I said I’m just having a hard time imagining how it would be organized.

cstjean · April 25, 2019, 6:31pm

I like the idea, and this rule especially:

I do believe that this issue ought to be solved in base, though. There should be a way to define a LightGraphs.cartesian_product method without importing LightGraphs.

quinnj · April 25, 2019, 7:16pm

Yes, this is essentially an opt-in global namespace for shared functions. I also agree that domains tend to not be that cleanly separated, and because of the overhead and lack of strict process, these kind of domain-specific packages haven’t happened/succeeded very much, which is my main motivation in proposing such a package that would have strict guidelines to ensure a clean solution to packages sharing methods w/o having to directly depend on each other.

quinnj · April 25, 2019, 7:20pm

In my mind, it would be a straightforward process: hey, my package A and this other package B have a shared function that does the same thing, but for our own types respectively. It doesn’t really make sense for my package to depend on B, or for B to depend on A, so it’s awkward, because how else do we get these functions merged. With the solution proposed here, you’d make a PR to SharedFunctions.jl with function foo end, along w/ docs around the shared meaning/concept of the function and designate one of the packages as “owner”, let’s say package A in this case. Then package A & B take a dependency on SharedFunctions, change their definitions to function SharedFunctions.foo(...) and go along their way. Package A would also have the option of defining a generic fallback definition for foo if applicable.

One additional idea would be to include submodules within SharedFunctions that generic function stubs could live in; possible domain-related submodules like Data, Stats, etc. Or perhaps submodules of the name of the owning package?

ExpandingMan · April 25, 2019, 7:31pm

Perhaps this approach is more permissive than I was initially imagining. I was thinking of myself writing some functions and not really knowing which it would be good to extend, so I either don’t extend much of anything or I just go nuts and extend everything. As I think of it more, maybe this just wouldn’t be a problem. In the former case, of there’s something you really ought to be extending you’ll probably know it, in the latter case, it probably just doesn’t matter as long as you always use types from your package.

So perhaps the usual no-type-piracy rule is sufficient.

non-Jedi · April 25, 2019, 7:38pm

I think the name of the owning package is a really good idea for organizing this package into submodules, and I generally like this model of optional dependencies much better than anything used in the Julia ecosystem currently. Big +1 from me.

As far as names, I think something a bit more descriptive of the problem being solved would be better: FunctionStubs, OptionalFunctions, etc.

I think as a first prototype, a single package for this makes the most sense, but maybe as we see how things naturally fall out, it might make sense to split off into domain-specific packages after a major version bump of this package.

Have you thought about what using those function stubs will look like in packages? Will you have to do a check of the existence of the method you’re interested in each time before calling a function from SharedFunctions? Or would any module using a SharedFunctions function be required to define a fallback for their specific use-case?

quinnj · April 25, 2019, 7:46pm

As a note on performance, I just tried generating a package w/ 1, 100, 1_000, 10_000, and 100_000 generic function stubs to see what it does to precompile/loading time. Up to 1_000 generic function stubs, there’s almost no difference in timing (~0.028s to load the precompiled package). For 10_000 generic function stubs, it started to increase linearly, (~.2s for 10_000, and ~20s for 100_000). Which is promising since, in my mind at least, the idea for this package would be a few dozen functions at most. But even up to 1_000, there’s no performance impact.

non-Jedi · April 25, 2019, 7:54pm

So the alternative proposal is that each package that wants to make itself optionally loadable creates its own dependency-less package where its function stubs are defined. e.g. Gadfly would depend on GadflyStubs, DataFrames would depend on DataFramesStubs, etc. I think if we had good tooling for developing multiple packages in the same git repo and releasing both as a single, automatic step, this could be a lot cleaner and nicer than a single global package.

Something like a git repo where each top-level folder is a Julia package with full Project.toml, src, test and everything with the single shared thing being the version number of all packages is defined in the top-level folder.

quinnj · April 25, 2019, 7:55pm

I liked the name of SharedFunctions.jl since, IMO, it’s descriptive of what’s going on: here’s a function foo that neither package can cleanly “own” from a hard dependency perspective, yet both desire to “share” the generic function definition and have their methods merged.

quinnj · April 25, 2019, 8:00pm

Hmmm…this could maybe work, but sounds like a lot of boilerplate, not only to generate all these files and work out their hierarchy/loading, but as an “implementation” package, I could imagine then having dozens of *Stubs dependencies which could get really annoying to maintain. We also have the consistency problem: how do I know all these packages are keeping their *Stubs files clean, or not removing generic function stubs arbitrarily (causing breakage).

jeff.bezanson · April 25, 2019, 8:35pm

I have proposed exactly this as an alternative. Basically, we want to be able to have both

module A
f(x::AType) = 0
end

module B
f(x::BType) = 1
end

but tell the system “if both of these modules are loaded, then A.f and B.f are the same function and should be merged”. That way you don’t have to load a package to extend its functions. It’s also much easier than trying to automatically merge all same-named functions, since the package authors have opted in to getting errors if methods are duplicated or ambiguous.

To me, the idea of SharedFunctions.jl is to just throw away namespaces entirely. And it’s true, having a single global namespace can be very convenient. You don’t have to think about where things come from. And yes, it can be hard to draw lines between different domains. But here, we’d be drawing a line between those who think there should be a single global namespace, and those who think there shouldn’t. The SharedFunctions meaning for a function f would just be whatever meaning is preferred by the first to make a PR. And SharedFunctions.jl itself would be a random list of unrelated names. I don’t see it making sense to maintain such a list together.

This has also been proposed before. See Common pool for methods as a way to solve common names in different packages · Issue #2327 · JuliaLang/julia · GitHub

chakravala · April 25, 2019, 8:48pm

An example of these “shared function” packages I am involved with is from @scheinerman

The AbstractLattices package is for sharing the \vee and \wedge method symbols

github.com

scheinerman/AbstractLattices.jl/blob/master/src/AbstractLattices.jl

module AbstractLattices

export ∧, ∨, dist, wedge, vee

function wedge end
function vee end

const ∧ = wedge
const ∨ = vee

wedge(x) = x
vee(x) = x

function dist end

end

Already, there are a whole bunch of (at least 6) registered packages that depend on it.

Therefore, in practice I am already in agreement with having shared function packages.

foobar_lv2 · April 25, 2019, 9:11pm

One possible way forward could be to spec a (tiny) subset of the julia language: header-modules and header-packages.

Header modules would be included by a new import_header_module keyword.

The only permissible subset of header-modules would be:

import_header something_upstream
function stubs: function fun_name end;
Abstract type declarations: abstract type foo{T1,T2} <: bar end
For the sake of convenience: Definitions of constants. Either const SOME_CONST = .... The set of admissible types of constant definitions would be severely restricted (strings, integers, floats, symbols, …).
Source-code comments.

Especially no macros, no initialization code, no executable code.

The goal of these restrictions would be that people could import_header malicious files without causing direct problems. Hence, we would need an ironclad verifier / normalizer that runs on import_header (verify that it is well-formed according to some extremely restrictive grammar before permitting the julia parser to touch it).

With such an infrastructure in place, optional dependencies could be managed by having non-optional dependencies on header packages (that could possibly live in a different registry). These could be thrown around like candy, and could even be implicitly installed, with some size limits (installing and importing a header cannot compromise a machine, nor can it compromise code that uses it for extending methods).

edit: permitting function aliases is a bad idea, because a change of header could turn a benign unused function extension from an optional reverse dependency into a compromise (user piratically extends some_header.some_fun, and updated header version has e.g. const some_fun = Base.other_fun leading to hilarity).

jeff.bezanson · April 25, 2019, 9:48pm

I think common base packages like AbstractLattices are great. Those are fully compatible with the notion of namespaces, since it lets you say e.g. “wedge in the lattice sense”. I would contrast that with having a single base package for everything, which doesn’t make as much sense to me.

quinnj · April 25, 2019, 10:02pm

Yeah, it seems the consensus is building around more limited scope “XBase” packages that still give a notion of namespace for the functions being shared, which is fine. The only issue I have there is the overhead and inconsistency I’ve personally noticed; these XBase packages getting too heavy, or package developers reaching for solutions like Requires.jl because of the onerous process of going through the whole flow of creating a new package. It’s much easier to slap on a new dependency and move on developing, rather than having to abstract a few stub functions out to a new XBase package, setup a new Project.toml, CI, tests, get it registered, wait several days, etc, etc.

jeff.bezanson · April 25, 2019, 10:07pm

I agree — I don’t think adding XBase packages handles every case. Those only make sense when multiple packages want to share a common vocabulary in certain ways. There are other cases where you just want to extend a function that happens to exist in another package, without factoring out an XBase. Of course one of the motivating examples here was DataFrames extending StatsBase.describe. We don’t want to factor out StatsBaseBase, and DataFrames is not a “stats” package in the sense that it wants to define lots of statistical methods. For those cases I think “manual merging” would be the solution.

tkf · April 25, 2019, 10:46pm

Is it something already implementable in the “user-land” without changing Julia itself? Revise already seems to be doing a lot of magics to edit the method table.

Topic		Replies	Views
Discussion: Context Dispatch - yes? no? questions? answers? Internals & Design	11	2310	May 17, 2019
Function name conflict: ADL / function merging? Internals & Design proposal , namespaces	212	13427	April 30, 2018
Julep: Taking multiple dispatch,export,import,binary compilation seriously Internals & Design namespaces	121	8380	May 18, 2018
"Names" packages? Tooling	31	1869	April 13, 2021
Possibility of `local import` statements in future? Internals & Design scope	63	3735	May 2, 2018

Proposal for SharedFunctions.jl package for optional dependency management

Related topics