Proposal for SharedFunctions.jl package for optional dependency management

OK so I put things together in a package. I wrote some tests and it seems to be working as I expected:

@Per Good point! I actually ended up using an instance of IndirectFunction as a callable, rather than IndirectFunction itself. I think it’s an important property that each “function” to have unique type and I don’t want to break such assumption.

@Tamas_Papp Yes, at the moment I only support “top-level” module of a package. It is not difficult to support sub-modules. But as this is for corss-package communication, I’m not sure supporting sub-modules is important.

2 Likes

A related problem to shared protocols is peer interfaces. Suppose that I’m trying to provide conversion of my type, A to someone else’s type B… today, right now. Sometimes it’s just not worth making a distinct protocol everyone implements, I just want to be able to build in interoperability in an “optional” block that is only activated if that other package is loaded. Think of it as conditional compilation/inclusion? Perhaps it’s even 3 way? Imagine you could provide a list of project UUIDs so that that code is only activated if all of the UUIDs mentioned is activated. This way you could make independent “glue” projects that are neither in A nor B. Julia could track all of those “glue” blocks, and as their dependencies are expressly loaded by the user, activate the relevant ones.

I have done just such a thing for the TensorAlgebra{V} abstract type

AbstractTensors.jl provides the abstract interoperability between tensor algebras having differing VectorSpace parameters. The great thing about it is that the VectorSpace unions and intersections are handled separately in a different package and the actual tensor implementations are handled separately also. This enables anyone who wishes to be interoperable with TensorAlgebra to build their own subtypes in their own separate package with interoperability automatically possible between it all, provided the guidelines are followed.

The key to making the whole interoperability work is that each TensorAlgebra subtype shares a VectorSpace parameter (with all isbitstype parameters), which contains all the info needed at compile time to make decisions about conversions. So other packages need only use the vector space information to decide on how to convert based on the implementation of a type. If external methods are needed, they can be loaded by Requires when making a separate package with TensorAlgebra interoperability.

It sounds like Requires.jl would still be the best approach in such case. My approach could be beneficial if the glue project itself is large and you want to re-route the dependencies within it.

1 Like

There is a flaw in the language, the global method table for multiple-dispatch.
I pointed out in the past the solution to this flaw, namely “context dispatch” or if you like a per module method-table.
I called for an open discussion about this direction, its merits, and drawbacks. At the time I encountered mostly resistance, maybe it is time to reopen the issue.

This is a solution not only to the shared naming problem but also to the latency and the first time responsiveness problem.

3 Likes

Isn’t it equivalent to my proposal? A global method table guaranteed to be “salted” by a unique data is (isomorphic to) a local method table, right? Do you foresee some additional benefits to make it builtin to the language?

Your proposal is aimed at solving a different problem … conditional loading of modules.
Let me see if I understand correctly.

Let’s say I have module Plots which handles plotting of Vectors and Matrices, and modules
DataTables_1 and DataTables_2 which handles tabular data, each in its own internal format.
DT1_PlotInterface
DT2_PlotInterface
Are two more modules that define the transformation from internal representation to a representation that is plottable.

you would like to automatically load DT1_PlotInterface in a module if both Plots and DataTables_1 are loaded.

I think that eventually this kind of automation just causes problems. It is best for a module to be immutable and not to change its state(names and types … values can change)

even between different invocations of the REPL

IMHO

using DataFrames
using Plots
using DataFrame_Plots

is preferred to having a @require in Plots or in DataFrames, at the very least this automation should
load whole modules instead of altering the state of some previous module.

1 Like

Can’t this be done already without any magics? (Unless you just can’t live with type piracy.)

I am trying to understand what the OP does actually want to solve and wondered if the cleaner thing would be something like

using DataFrames
using Plots
using DataFrames.Plots

I.e., there is some Plots specific stuff in DataFrames but it is not loaded by default. Only when explicitly loading Plots and DataFrames.Plots one gets access to that. Since DataFrames.Plots is a submodule of DataFrames it can remain in the same package, which seems to be important to not split the plotting code from the core package code.

I’m not sure what exactly the OP has in mind, but I know that there are plenty of packages that would like to provide GPU support (which often requires some custom functions to make CUDA GPU compute efficient/possible) or AD support (custom adjoints), but don’t want to force their users to always pull the list of dependencies that would be required to ensure those functions can be properly defined.

You might say, “Oh well this is just a package or two, that’s not so bad!”, but then I’d ask if you’d be willing to also add as dependencies: MPI.jl, Zygote.jl, AMD GPU packages, etc., just so that in cases where users might be using one or more of those packages together with yours, you can get that slightly improved functionality? Note that having those packages as hard dependencies also means that if those other packages have anything “weird” happen with their or your dependency compat list or build script, you’ll suddenly end up with all your users getting Pkg resolver errors, or failed builds causing your package to fail to load.

Suffice to say, we want the benefits of multiple dispatch across package boundaries without bringing along all the extra baggage that that would invole by doing things “the normal way”.

2 Likes

Not to derail the main discussion, but can someone give me an overview of the problem we are trying to address? I am lost in the discussion. What’s wrong with the current way (whatever that is?)

What about for your library, MyTypeOrFunctionality.jl, adding companion packages like MyTypeOrFunctionalityMPI.jl, MyTypeOrFunctionalityAD.jl, MyTypeOrFunctionalityGPU.jl, etc as separate packages that extend functionality?
This could easily end up being a lot of packages, but I don’t think that itself is unreasonable.

Documenting that someone has to load these extra packages to get the functionality is probably the more burdensome component.

2 Likes

Sure, this has always been an option, I just personally think it feels kind of ugly and kludgy, especially compared to the simplicity and elegance of Requires.jl. But when you have two or more packages that would like to have a lot of shared functionality in one place, then your approach would probably scale best.

yes, but the question is whether DataFrames.Plots module is part of the DataFrame package
or Plots.DataFrames module is part of the Plots Package.

In the “Context Dispatch” scheme of things the method table is fused upwards, that is, every exported function is fused with the existing name in the importing module, so each module has its own version of the method table for a given function.
Dispatch is handled according to the calling module method tables(the most general one) unless it can be proven at compile time that a more specific method table would suffice.

In Context Dispatch we could have the following:

module DataFrames
#module code#
    plot(dt::DataFrame,col::Symbol) = begin
        V::Vector = dt[col]
        plot(v)
    end
end

module Plots
    plot(V::vector) = begin
        #functon code#
    end
end

and then in Main

using DataFrames
using Plots

trying to plot a DataFrame from Main would call plot function from DataFrames which will handle conversion and call plot again with a vector type that will dispatch to the Plots module(according to the method table for “plot” in Main- the calling module).

And this way, there is no Type piracy just loosely coupled conventions about the word “plot”.
The user is free to use alternative plotting.

For the purpose of coordination between packages, we can define a PlotsInterface module that will be imported by both DataFrames and let us say GRPlots, PyPlot, ReplPlots etc

Yesterday’s pirates are today’s businessperson

3 Likes

I understand now what you mean by “Context Dispatch”. So I can comment on this:

No, my proposal is not about conditional loading. It actually solves what you want to solve. The refined version I packaged up lets you do what you wrote in your example code:

module DataFrames
    using IndirectImports
    @indirect import Plots="91a5bcdd-55d7-5caf-9e0b-520d859cae80"

    @indirect Plots.plot(dt::DataFrame,col::Symbol) = begin
        V::Vector = dt[col]
        plot(v)
    end
end

module Plots
    using IndirectImports

    @indirect function plot end
    @Indirect plot(V::vector) = begin
        #functon code#
    end
end

but without any __init__ like Requires.jl needs (so no conditional “loading”). See the actual sample code I posted above for how it works.

8 Likes

I realized that showing a code snippet with my own macro is rather pointless (it could be doing anything). The code example above is lowered roughly to

module DataFrames
    using IndirectImports
    const Plots = IndirectImports.IndirectPackage(
        UUID("91a5bcdd-55d7-5caf-9e0b-520d859cae80"),
        :Plots)

    (::typeof(Plots.plot))(dt::DataFrame,col::Symbol) = begin
        V::Vector = dt[col]
        plot(v)
    end
end

module Plots
    using IndirectImports

    const plot = IndirectImports.IndirectPackage(
        UUID("91a5bcdd-55d7-5caf-9e0b-520d859cae80"),
        :Plots).plot

    (::typeof(plot))(V::vector) = begin
        #functon code#
    end
end
1 Like

I understand your solution, IndirectImports holds the Type of any function in any module.

Thus enabling extending a function in a module before the module is loaded.
I like that and it can be incorporated into the language as the mechanism for extending functions.

your IndirectImports is the global method-table

I would make the UUID optional, reading it from the project.toml if it is not specified explicitly

3 Likes

I’d say yes and no. Yes, because it uses Julia’s method table which is global. No, because it is prefixed by the unique id. This let us avoid name clash unlike SharedFunctions.jl and Context Dispatch.

That’s a good point. I actually do this automatically for “upstream” module (Plots in the example above) but not for “downstream” modules (DataFrames in the example). But it is actually possible to do this for downstreams now that everyone has to use Project.toml. We can use [extras] to avoid installing optional dependencies even if it is specified in Project.toml. It would be nice if Pkg.jl can handle [compat] for [extras] packages (IIUC it’s not possible ATM). This way, we can specify the version compatibility without installing the optional packages.

2 Likes

It’s amazing that the solution to this could be so simple.

How would we roll this out? Will it mean pull requests to upstream packages using IndirectImports.jl to define their interface? also using the macro for every method definition?

The difference between this and Requires.jl is that Requires.jl places all the boilerplate burden on downstream packages, where this shares it across upstream and downstream packages.

This is orthogonal to the dispatch issues.

Currently, the method table is global and it operates exactly as your solution. The only difference
is that to extend a function in another module, that module needs to be defined/loaded(hence its UUID, if it exists, is known) but that can be relaxed, without heavy changes to the language.

However binary caching of compiled code between sessions is still a problem with a global method table since any change to the global state requires recompilation unless it can be proven not to affect the final result which is not always simple.

Also, type piracy and safety is still a major problem since any module can change the code of any other module even if the target module and its users(importers) do not import the malicious module.

1 Like