Proposal for SharedFunctions.jl package for optional dependency management

I completely agree with you here. The whole discussion just reflects a flaw in the language that it would be a good time to fix. Your solution with merge_using seems a possible one. Perhaps there are others. Having a sharedfunctions package is basically equivalent to put all these shared names in Base, which was one of the solutions talked about in the long thread everybody mentions.

Please, language designers, step in and solve this problem once and for all!

1 Like

In theory we could almost make it automatic, right? That is, have a github bot that generates a PR that extracts all exported function names, abstract types and abstract docstrings into an abstract header package, and rewires the old package to require and import and extend and reexport functions from the abstract header package.

In theory, the creation of lightweight AbstractFoo / HeaderFoo packages from some Foo package should not requite a lot of human thought or intervention (corner case: functions and types declared and exported by macro that takes environment into account). If this could be done with close to zero work, and if these lightweight interface packages could be enforced to stay lightweight dependencies, then I think this would get a lot more traction.

This doesn’t work at all. It is when the method is defined is when the author needs to decide what function it belongs to. This can currently only be done by extending a function but that is not an inherent limitation. You could for example envision writing something (loosely) like

@extend StatsBase function describe(...)
   function body
end

Which would also “extend” the StatsBase function without having to load it.

Again, there needs to be some way to tell the system what function you are extending and the current way of having to load the package to do so, might be limiting.

The fact that there is no automatic method merging based on name is fundamental to the ability to write generic code.

4 Likes

I think there maybe could be a use case for use-side merging, but there should be no need for it in cases where package authors already know the functions should be merged, and in fact are doing it today. So we should first make it really easy for package authors to specify the merging when they already know it’s the right thing. Then there might be cases that fall through the cracks, and at some point we might need to add use-side merging, but I see that as being farther down the road.

7 Likes

This would be a great step forward. For package writers that would help the immediate proliferation of Base packages which don’t actually share any types. Of course, if they share concrete or abstract types, that is a different story and they need a shared base package.

If you mean “automatic” in that it is done without any choice or control, then I agree with you. But if you mean that it is fundamental to generic programming to only have one “active” concept for any particular function at any particular point in time without namespace disambiguation, then I disagree completely. Other languages have handled that, including both single dispatch languages and generic ones. Leave it to the users to determine if they want to have concurrent generic concepts that involve the same function names - but ensure the package writers don’t worry about that stuff. with a @extend and a @merge using we could have it all.

I think I found a super simple solution.

The key observations are:

  1. Each package has a UUID
  2. isbitstype(UUID) hence a UUID can be used as a type parameter

So, the idea is to define a “universal entry point function”:

module IndirectImports
    struct IndirectFunction{uuid, name} end
end

which can be used to refer to a function in a package without importing it. An example usage is:

module Upstream
    using UUIDs
    using ..IndirectImports: IndirectFunction
    const upstream_uuid = UUID("332e404b-d707-4859-b48f-328b8b3632c0")
    const fun = IndirectFunction{upstream_uuid, :fun}
end # module

module Downstream
    using UUIDs
    using ..IndirectImports: IndirectFunction
    const upstream_uuid = UUID("332e404b-d707-4859-b48f-328b8b3632c0")
    const fun = IndirectFunction{upstream_uuid, :fun}

    struct DownstreamType end
    fun(::DownstreamType) = "hello from Downstream"
end # module

@show Upstream.fun(Downstream.DownstreamType())

where the Downstream package defines a “function” in the Upstream package without importing the Upstream.

(The fact that IndirectFunction{uuid, name}(...) does not return a IndirectFunction is kind of bad but it’s not like this is forbidden…)

Does it work? I feel like I’m missing something as this is so simple. Maybe it is a too much burden on the Julia compiler to manage a possibly huge list of methods for IndirectFunction? Or maybe not?

6 Likes

Cool idea! A few macros would make usage fairly painless.

One issue is that typeof(Upstream.fun) === DataType. This might be a problem with code that expects Function.

Note that there is another layer: functions are in modules, which are then available in packages. It is, of course, a convention in Julia to have the package name and its main module coincide.

Requires.jl is also used to define functionality conditional on having another package loaded, ie if the package is not loaded than some code is ignored entirely. Since this effectively requires introspection of a state (the loader), it would be best to have a syntax and implementation that is part of the language.

OK so I put things together in a package. I wrote some tests and it seems to be working as I expected:

@Per Good point! I actually ended up using an instance of IndirectFunction as a callable, rather than IndirectFunction itself. I think it’s an important property that each “function” to have unique type and I don’t want to break such assumption.

@Tamas_Papp Yes, at the moment I only support “top-level” module of a package. It is not difficult to support sub-modules. But as this is for corss-package communication, I’m not sure supporting sub-modules is important.

2 Likes

A related problem to shared protocols is peer interfaces. Suppose that I’m trying to provide conversion of my type, A to someone else’s type B… today, right now. Sometimes it’s just not worth making a distinct protocol everyone implements, I just want to be able to build in interoperability in an “optional” block that is only activated if that other package is loaded. Think of it as conditional compilation/inclusion? Perhaps it’s even 3 way? Imagine you could provide a list of project UUIDs so that that code is only activated if all of the UUIDs mentioned is activated. This way you could make independent “glue” projects that are neither in A nor B. Julia could track all of those “glue” blocks, and as their dependencies are expressly loaded by the user, activate the relevant ones.

I have done just such a thing for the TensorAlgebra{V} abstract type

https://github.com/chakravala/AbstractTensors.jl

AbstractTensors.jl provides the abstract interoperability between tensor algebras having differing VectorSpace parameters. The great thing about it is that the VectorSpace unions and intersections are handled separately in a different package and the actual tensor implementations are handled separately also. This enables anyone who wishes to be interoperable with TensorAlgebra to build their own subtypes in their own separate package with interoperability automatically possible between it all, provided the guidelines are followed.

The key to making the whole interoperability work is that each TensorAlgebra subtype shares a VectorSpace parameter (with all isbitstype parameters), which contains all the info needed at compile time to make decisions about conversions. So other packages need only use the vector space information to decide on how to convert based on the implementation of a type. If external methods are needed, they can be loaded by Requires when making a separate package with TensorAlgebra interoperability.

It sounds like Requires.jl would still be the best approach in such case. My approach could be beneficial if the glue project itself is large and you want to re-route the dependencies within it.

1 Like

There is a flaw in the language, the global method table for multiple-dispatch.
I pointed out in the past the solution to this flaw, namely “context dispatch” or if you like a per module method-table.
I called for an open discussion about this direction, its merits, and drawbacks. At the time I encountered mostly resistance, maybe it is time to reopen the issue.

This is a solution not only to the shared naming problem but also to the latency and the first time responsiveness problem.

3 Likes

Isn’t it equivalent to my proposal? A global method table guaranteed to be “salted” by a unique data is (isomorphic to) a local method table, right? Do you foresee some additional benefits to make it builtin to the language?

Your proposal is aimed at solving a different problem … conditional loading of modules.
Let me see if I understand correctly.

Let’s say I have module Plots which handles plotting of Vectors and Matrices, and modules
DataTables_1 and DataTables_2 which handles tabular data, each in its own internal format.
DT1_PlotInterface
DT2_PlotInterface
Are two more modules that define the transformation from internal representation to a representation that is plottable.

you would like to automatically load DT1_PlotInterface in a module if both Plots and DataTables_1 are loaded.

I think that eventually this kind of automation just causes problems. It is best for a module to be immutable and not to change its state(names and types … values can change)

even between different invocations of the REPL

IMHO

using DataFrames
using Plots
using DataFrame_Plots

is preferred to having a @require in Plots or in DataFrames, at the very least this automation should
load whole modules instead of altering the state of some previous module.

1 Like

Can’t this be done already without any magics? (Unless you just can’t live with type piracy.)

I am trying to understand what the OP does actually want to solve and wondered if the cleaner thing would be something like

using DataFrames
using Plots
using DataFrames.Plots

I.e., there is some Plots specific stuff in DataFrames but it is not loaded by default. Only when explicitly loading Plots and DataFrames.Plots one gets access to that. Since DataFrames.Plots is a submodule of DataFrames it can remain in the same package, which seems to be important to not split the plotting code from the core package code.

I’m not sure what exactly the OP has in mind, but I know that there are plenty of packages that would like to provide GPU support (which often requires some custom functions to make CUDA GPU compute efficient/possible) or AD support (custom adjoints), but don’t want to force their users to always pull the list of dependencies that would be required to ensure those functions can be properly defined.

You might say, “Oh well this is just a package or two, that’s not so bad!”, but then I’d ask if you’d be willing to also add as dependencies: MPI.jl, Zygote.jl, AMD GPU packages, etc., just so that in cases where users might be using one or more of those packages together with yours, you can get that slightly improved functionality? Note that having those packages as hard dependencies also means that if those other packages have anything “weird” happen with their or your dependency compat list or build script, you’ll suddenly end up with all your users getting Pkg resolver errors, or failed builds causing your package to fail to load.

Suffice to say, we want the benefits of multiple dispatch across package boundaries without bringing along all the extra baggage that that would invole by doing things “the normal way”.

2 Likes

Not to derail the main discussion, but can someone give me an overview of the problem we are trying to address? I am lost in the discussion. What’s wrong with the current way (whatever that is?)

What about for your library, MyTypeOrFunctionality.jl, adding companion packages like MyTypeOrFunctionalityMPI.jl, MyTypeOrFunctionalityAD.jl, MyTypeOrFunctionalityGPU.jl, etc as separate packages that extend functionality?
This could easily end up being a lot of packages, but I don’t think that itself is unreasonable.

Documenting that someone has to load these extra packages to get the functionality is probably the more burdensome component.

2 Likes