Proposal for SharedFunctions.jl package for optional dependency management

I realized that showing a code snippet with my own macro is rather pointless (it could be doing anything). The code example above is lowered roughly to

module DataFrames
    using IndirectImports
    const Plots = IndirectImports.IndirectPackage(
        UUID("91a5bcdd-55d7-5caf-9e0b-520d859cae80"),
        :Plots)

    (::typeof(Plots.plot))(dt::DataFrame,col::Symbol) = begin
        V::Vector = dt[col]
        plot(v)
    end
end

module Plots
    using IndirectImports

    const plot = IndirectImports.IndirectPackage(
        UUID("91a5bcdd-55d7-5caf-9e0b-520d859cae80"),
        :Plots).plot

    (::typeof(plot))(V::vector) = begin
        #functon code#
    end
end
1 Like

I understand your solution, IndirectImports holds the Type of any function in any module.

Thus enabling extending a function in a module before the module is loaded.
I like that and it can be incorporated into the language as the mechanism for extending functions.

your IndirectImports is the global method-table

I would make the UUID optional, reading it from the project.toml if it is not specified explicitly

3 Likes

I’d say yes and no. Yes, because it uses Julia’s method table which is global. No, because it is prefixed by the unique id. This let us avoid name clash unlike SharedFunctions.jl and Context Dispatch.

That’s a good point. I actually do this automatically for “upstream” module (Plots in the example above) but not for “downstream” modules (DataFrames in the example). But it is actually possible to do this for downstreams now that everyone has to use Project.toml. We can use [extras] to avoid installing optional dependencies even if it is specified in Project.toml. It would be nice if Pkg.jl can handle [compat] for [extras] packages (IIUC it’s not possible ATM). This way, we can specify the version compatibility without installing the optional packages.

2 Likes

It’s amazing that the solution to this could be so simple.

How would we roll this out? Will it mean pull requests to upstream packages using IndirectImports.jl to define their interface? also using the macro for every method definition?

The difference between this and Requires.jl is that Requires.jl places all the boilerplate burden on downstream packages, where this shares it across upstream and downstream packages.

This is orthogonal to the dispatch issues.

Currently, the method table is global and it operates exactly as your solution. The only difference
is that to extend a function in another module, that module needs to be defined/loaded(hence its UUID, if it exists, is known) but that can be relaxed, without heavy changes to the language.

However binary caching of compiled code between sessions is still a problem with a global method table since any change to the global state requires recompilation unless it can be proven not to affect the final result which is not always simple.

Also, type piracy and safety is still a major problem since any module can change the code of any other module even if the target module and its users(importers) do not import the malicious module.

1 Like

Honestly I found Context Dispatch an interesting solution first. But I think there are some difficulties (or maybe I’m missing some details):

  1. For example, Plots and PyPlot exports the function named plot and I believe
julia> using Plots

julia> using PyPlot

shouldn’t silently merge them.

  1. More in general, Context Dispatch makes it impossible to define generic interface like f(::Any) since you don’t own f anymore if you want to export f.

  2. You need to export function to merge the methods. However, some methods are primary for overloading and does not make sense to export it (e.g., Broadcast.broadcasted).

  3. In some situations it could be very confusing. Consider that modules A and B are imported in Main and they have the dependency graph

A -> B .-> C
       `-> D

E -> F .-> D
       `-> G

where -> means using; i.e., module A has using B. Suppose the modules C, D, and G (i.e., the “third level” module) define and export the function f specialized for its internal type. Also suppose that B and E do not export f. If I understand the definition of Context Dispatch, B.f and E.f do not share the method table. However, does module D “smuggles” method f(::TypeInG) to B.f even if G is not in the context of B?

3 Likes

Well, I myself don’t have a large set of packages that needs this solution. So it was just an exercise for me and I’m not planning to do anything with it. But if somebody wants to use it I’ll register it.

Yes, for all interface functions… At least that’s the plan. But, actually, there is a “bug” https://github.com/JuliaLang/julia/issues/25744 in Julia that makes it “work” without the macro at the moment.

3 Likes

You are right it should issue a warning if you try to use them unqualified.
please also note that I make a distinction between functions and methods.

A function is a name in a module with several methods.

Fusing is supposed to happen either semantically(implicitly), where two modules export the same name
and a third module is Using them, or explicitly similar to the mechanism that exists today or the mechanism you suggested.

You could look up previous posts regarding Context Dispatch, maybe it will clear things up.

I promise you it is less confusing than it looks, please post some code it will be easier for me to follow what you are asking.

And another thing… I don’t own this idea, I merely researched it, and the more I did so it became clear to me that it is a generalization of the current scheme of things, and that there is room for further understanding.

This seems to have become another thread on context dispatch, which was clearly not what Jacob intended, but as I still don’t have a clear picture of what you mean by the term, I guess I’ll use it as an opportunity to ask more about this:

If there are other references, in PL literature or elsewhere, on the “context dispatch” concept, it might be helpful if you could share them. The closest thing I can figure out to what you’re proposing is Ruby refinements. Note, however, that Ruby refinements were introduced to mitigate rampant monkey patching issues in the Ruby ecosystem, which would suggest that a similar feature in Julia would be primarily to avoid type piracy, which would be more limited than what it sounds like you’re proposing (although, again, I’m still unclear on precisely what that is).

There are very basic apparent problems with “context dispatch” that were pointed out in previous threads but never addressed as far as I could tell. The most fundamental one is something like this scenario:

  • Base defines sort! which calls isless by default
  • Package A defines type T and isless methods for it
  • Package B defines function f which calls sort! on arguments of unknown type
  • Package C uses A and B: it constructs objects of type A.T and passes them to B.f

Since A and B know nothing about each other and Base knows about neither of them, and they all—from the way you’ve described it—have different versions of isless and sort! that don’t know about types defined elsewhere, how does this work? Can C pass objects of type A.T to B.f and have them be sorted according to the isless ordering?

3 Likes

Another topic which we discussed previously is the notion of a local import statement

This is what resulted in making the @force import macro in ForceImport.jl … however, the real source of the problem is as I had mentioned, the context-based dispatch problem:

One of my major other ideas was to have a context-sensitive dispatch with local import ability.

The @force import macro only simulates this feature by completely separating the method tables for the conflicting methods, and then using forwarded dispatching to handle Base dispatch.

What is being talked about here is the full solution to this problem.

3 Likes

There are three several topics here as I see it and I’ll do my best to be brief and concise, with as little errors as possible

  1. The discussion started on having the need to define some abstract base module with names, so different modules can coordinate upon(extend common interface)

  2. It evolved to:

  1. @tkf made the observation that we can extend a function in a module before it is loaded, in a fashion that is roughly equivalent to the following:
    let
    Function{UID,NAME} where UID where NAME
    be the abstract base type of all functions in julia, where UID stands for the module or package UUID
    Then any module can extend a function in another module without having to specifically load it.

Here is an example of extending function plot in module Plots using module name for UID

(plot::Function{:Plots,:plot}(x::MyVecType) = plot(toBaseVec(x))
  1. Jeff suggested that there will be a way to tell the system to merge functions in different modules(hope I understood correctly)

  2. @chakravala wants to do the same but make the merge visible only locally as not to affect the integrity of other modules

  3. Context Dispatch is a generalization of the way multiple-dispatch is handled today as it enables you to do the same things, using the same code, however, an implicit merging of exported function names as in 5 feels very natural in this way and can often replace explicit coordination as in 3

so for your toy example, calling function B.f from the context of module C uses modules Base, A and B for its method table so there is no problem.

Now let’s say module D who uses C, and calls C.g() which in turn calls B.f with the constructed types.
Since it can be proved that the context of C will suffice, the context can be narrowed.

And as long as you don’t change the state of module C or its dependent modules you can re-use the
binary resulting from compiling the code for call C.g() from the context of C

Another aspect is that the automatic merging of exported names suddenly looks very appealing.

Your example with Context Dispatch and automatic merging of exported names would look like:

module Base
    sort!(x,y) = isless(x,y)
    isless(x,y) = ErrorException("unimplemented")
    export isless,sort!
end

module A
    struct T end
    isless(x::T,y::T) = true
    export isless
end

module B
    using Base
    f(x,y) = sort!(x,y)
end

module C
    using Base, A, B
    
    g() = B.f(A.T(),A.T())
end

In the context of module C, the functions Base.isless and A.isless got merged.

I don’t think so, like many others here, I think, I learned about compilers and LLVM and multiple-dispatch through the feat of engineering we call “julia”.

It came up initially to solve what @chakravala was writing about, it was a little bit obscure in the beginning but it starts to feel more and more like the right way to go.

O.k it took me more than 3 hours to write this, and I am losing focus, what I meant by the saying
“I don’t own the idea” is that it feels like a larger subject than just a small eureka moment.

10 Likes

Thanks a lot for summarising the whole thread! I wish every long thread has something this!

Here is the code for what I was asking. I realized I don’t need modules A and E so it’s not here anymore:

module C
    struct TC end
    f(::TC) = TC
    export f
end

module D
    struct TD end
    f(::TD) = TD
    export f
end

module G
    struct TG end
    f(::TG) = TG
    export f
end

module B
    using ..C, ..D
end

module F
    using ..D, ..G
end

using .B, .F

Note that I’m not evaluating using .C etc. in the top-level namespace and B and F do not have export. So, f is not defined in the top-level namespace. I suppose this means f in B and f in F have different Contexts.

In the post above, I was asking what would happen if I do B.f(F.G.TG()). But I realized that more important question is what happens if I evaluate the following expressions:

F.f(F.G.TG())      # TG?
D.f(F.G.TG())      # throws MethodError?
B.D.f(F.G.TG())    # throws MethodError?
F.D.f(F.G.TG())    # throws MethodError?

If my guess were right, it would mean

D.f === B.D.f      # true?
D.f === F.D.f      # true?
F.f === B.f        # false?
F.f === D.f        # false?

So, export and using for context dispatch does not just construct a local method table but also create a new function object? Otherwise, it’d be impossible to understand

F.G.TG() |> F.f
F.G.TG() |> D.f
F.G.TG() |> B.D.f
F.G.TG() |> F.D.f

(i.e., x |> f does what f(x) does).

replying @tkf

First I want to make a distinction between context-dispatch(considering only modules in context for the purpose of dispatch) and automatic merging of exported names,
although they fit great together they are two different things.

Your question is about the order of things when both are enabled, so I will introduce a notation similar to
@eval

@context M f(args...)

which means execute f(args…) in the context of module M

and for simplicity of writing, let’s assume that all names are exported.

@context F f(TG())
# since f is comprised of method-tables from modules G and D
# imported by module F and it can be proven
# in compile time that 
# using the context of G
# will resolve the call just as if the greater context was used.
# The context is narrowed and it becomes
@context G f(TG()) #=> TG
# note that context narrowing is just an optimization,
# but without it context dispatch does not offer a solution to binary caching 
#D.f(F.G.TG())
@context D f(TG()) # yes, Method error

#B.D.f(F.G.TG())    # throws MethodError?
@context B @context D f(TG())    # yes,Method error

you are right

you are right again, in my first POC I used the following type to represent
the method table

struct Func{Context,Mod,Name} end

and some sets to keep track of instantiated types because

struct Func{Context,Mod,Name} end

a = Func{:B,:A,:f}()

typeof(a) <: Func #true

subtypes(Func) # empty vector :-( why?

Edit:
another point for thought
The most general way to define a type for a function might be

struct Func{Context,Mod,Name,Args} end

I’m a big fan of simplicity and I think the easiest way to do this is something like StatsSharedFunctions. It has the same basic premise as the OP where the functions wouldn’t have anything other than function food end but would allow a community standard to be defined for the intended use. If we aim for a shared definition in all of Julia I just see a lot of nit picky arguments arising across fields.

This is what R is starting to do with some things like fit. A package has some functions that act as a S3 generics (kind of like multiple dispatch if don’t know R) and only have documentation.

I still think it’s more reasonable to understand Context Dispatch as “import-site function object auto-creation” rather than thinking it as some call-site modulation. For example, how do you interpret (B.f ∘ F.f)(F.G.TG())? The object returned from ∘(B.f, F.f) needs to know that the context of the first (second) argument is in B (F).

I wonder if it is enough. Consider:

module X
    module Y
        abstract type TY end
    end

    using .Y: TY
    f(::TY) = TY
end

module Z
    using ..X.Y: TY
    struct TZ <: TY end
end

z = Z.TZ()
@context X X.f(z)

Can caching be useful for the “methods” like @context X X.f(::Z.TZ) which is unknown at the time the module X is precompiled? Here, I’m pretending X and Z are in different packages.

From the context of X the type of z (its concrete type) is unknown so it is a MethodError

But for a context that has both X and Z, once this method is compiled it can be cached for later use as long as the context did not change (no new names or types).

I still don’t have a clear solution on where to store compiled code, I was thinking of introducing “virtual modules” something like:

module Virtual_X_Z
using X 
using Z
end

and storing compiled code here. so any module that uses both X and Z automatically loads the so/dylib/dll associated with the Virtual_X_Z module

@tkf let’s continue in another thread

Sounds good :+1:

Thanks to everyone for the lively discussions and thoughtful responses. In line with general recommendations and consensus with this thread, I’ve created the DataAPI.jl package as a domain-specific shared namespace for data-related packages to help decouple some key packages. Thanks again for all the responses.

5 Likes

@quinnj Did you see my suggestion which is implemented here: https://github.com/tkf/IndirectImports.jl ? I thought you might have missed it because there are many messages in this thread (which is kind of my fault). Compared to DataAPI.jl’s approach, I think it is better because the owner (upstream) packages don’t have to update DataAPI.jl and hence can be more flexible in defining functions that can be extended and “used” without importing the owner packages (i.e., the functions that would be registered in DataAPI.jl).

3 Likes

Maybe the best option is a mixture between the two approaches; DataAPI.jl will be used for anything where common agreement on a (reasonably formal) API between multiple packages is desired (in this case, the Julia data ecosystem), and then IndirectImports can be used for more informal or ad-hoc interfaces are needed (maybe on the GPU compute side of the house). Especially when you only want to share one or a few functions across packages, IndirectImports is probably a good initial approach, with the option to “upgrade” to a shared XBase or XAPI package once there’s consenseus on what the shared API looks like.

1 Like