Proposal for SharedFunctions.jl package for optional dependency management

Note that I think that open-ended shared name spaces are dual to XBase packages and should not be the same. Currently StatsBase is playing both roles, which is why it’s getting a bit heavy, but I think the role of defining shared common functionality belongs in StatsBase (the intersection over all of stats) while a Statistics umbrella namespace populated by extend (effectively the union over all stats).

3 Likes

How about transferring the ownership? It’s a reasonable scenario that you want to re-organize a family of packages to change the ownership, right? But, I don’t think it works well. One possible attempt would be to bump the major version of SharedFunctions.jl when transferring a function from PkgA to PkgB. Suppose updated SharedFunctions.jl version is 3.0. Then both PkgA and PkgB can put SharedFunctions = "3.0" in [compat] so that they don’t type-pirate each other. But a huge downside of this attempt is that all other packages using SharedFunctions.jl that are completely unrelated to PkgA and PkgB need to tweak their [compat]. Or maybe I’m missing something? Are there better ways to do it?

Sounds like you are onto something with the union and intersection terminology.

Very light-weight packages are needed for union namespaces, while intersections may have heavier dependencies/content.

I think the best solution at this point is defining very lightweight interface packages, that can be used by all packages that share functionality. This is effectively invisible to the users, but provides more granularity than a single shared namespace.

Granularity should be especially important for versioning, since adding/removing functions or simply changing the interface of one in SharedFunctions.jl should require a major version bump, so it would need to be incremented quite frequently (in comparison to smaller packages). Given the tooling we already have to create new packages, I don’t think making small ones is very costly. Perhaps there should be a naming convention, but I think <…>Base is emerging as one.

Also, some uses of Requires.jl just define functions which are actually semi-internal and not really meaningful on their own. Consider PGFPlotsX.TableData, which has methods defined when loading DataFrames, Contour and StatsBase. But the user-exposed API is PGFPlotsX.Table, which just calls it.

In my opinion, Abstract《...》 would be a good name convention for unions, since the names are abstract and do not carry functionality.

A 《..》Base package would have a lot of features for specific use for intersections.

1 Like

What happens when two different packages want to use the same name for a different generic function (which is the reason for name spaces in the first place). Will it just be banned from being in SharedFunctions.jl (so that first one that takes the name owns it forever) or will it be impossible to write generic code using any function from SharedFunctions?

2 Likes

What is your definition of generic function? If i look into the manual i find in Julia Functions · The Julia Language starting with the sentence “Every function in Julia is a generic function.”

I think the whole discussion (again) is about consistent merging into a method table, which is possible as long as the types (and therefore the methods) are orthogonal at the time of merging.

1 Like

This is true although I am not sure how to be more clear. A different generic function with the same name is a function with the same name as another function… For example:

module A
f(x) = x
end

module B
f(x, y) = x + y
end

Here A.f and B.f are different (generic) functions.

The reason why we cannot just merge the two f is because of how you write generic code. An example of a generic function is push!. We can see the docs for it:

  push!(collection, items...) -> collection


  Insert one or more items at the end of collection.

We now know what the generic function Base.push! means. We can extend that function to our own types by adding a method to Base.push!. Everyone who extends this function agrees that they are using the documented meaning of push!. We can now make a generic function that takes a collection and uses the function Base.push! somewhere in the function body, on that collection.

Now, let’s consider another function with the same name:

module MyGame

"""
    push!(p)
Push the person `p` to an adjacent square
"""
function push!(p) end

end

This has a completely different meaning than Base.push!. This is fine though because in generic code we either use Base.push! or MyGame.push!. These are different functions so we need the concept of namespaces to decide which one to use.

Now, if Base.push! and MyGame.push! was merged, it is impossible to look at a piece of generic code that uses push! and figure out what it is doing. Is it adding things to a collection or pushing person’s around? As already been said, all functions are generic, so without a way to know what the generic functions do, it is impossible to reason about generic code. Any function could mean anything.

Therefore, with regards to the proposed SharedFunctions, it can either be that

  1. The first one to claim a function name gets to document it. We then know what that function does and can write generic code with it.
  2. Everyone extends the function with the same name with no regards of what the function means. It is then not possible to write generic code with that function (because the concept of the meaning of a function doesn’t exist anymore).
5 Likes

Note that many of the themes recurring this topic were discussed in

I think it is worth re-reading, it is full of excellent points.

What changed since is that we got Pkg(3) and the new registry functionality, so making small interface packages is simpler than ever.

1 Like

The whole Base or a hacked, opt-in global namespace package for method merging smells wrong.

There seem to be two cases: (1) packages are sharing types and methods that act on those types. That should be in a Base package for sure and requires coordination; and (2) packages just want to share a function name, which may or may not be punned.

The solutions for (1) are already there, and it is convenient enough for package developers to work around. They should be jointly designing types and likely using import instead of using in their packages.

The problem tends to be (2), which is what this SharedFunctions.jl is intended to solve.

I vehemently disagree with this approach, not because I don’t think this is all an issue, but because it is such a big issue that it means we should step back before hacking. Using packages just to share function namespaces puts the impetus on method merging on the wrong people at the wrong place in the code. That is, it is at the point of using that the decision to merge should be made.

So what is the alternative? This has been brought up many times before, but I have come around to thinking it is the only solution: have a way to do a “using” which merges methods. Then it is up to the point of usage whether they want to merge or not. Users who want a convenient using for multiple packages with non-conflicting solve! methods can do so, where they are taught to avoid merging if possible, that clashes for possible for general types, but it is usually safe for types built into the package itself.

A merge using MyPackage vs. a using MyPackage are different in that the first one merges methods. It is up to users whether they want to do that or not. As a package developer, you write for the namespaces and generic interfaces that make sense and don’t worry about it anymore.

Also, merge function myfunc() end would also merge a defined function for a user into the current myfunc if it exists. This gets around the ordering of using fragility.

This is the sort of thing that can kind of be done in macros - GitHub - chakravala/ForceImport.jl: Macro that force imports conflicting methods in modules , but this is a hack with all sorts of eval-trickery. The only way it would be a successful solution to the issue is if it was integrated into the language, the documentation, testing, packagecompiler, etc. Assuming that it is possible, the whole thing is very teachable and is not a breaking change.

Later, there are ways that a warning could be made to detect conflicting generic concepts for the methods when merging (effectively it involves looking for overlap in the tree of dispatching for the methods) but that can wait.

True, but this is a large administrative burden for anyone creating packages, tagging them, setting versions, dealing with user complaints with using clashes for new packages, etc. What will happen when not every package creator is on a first-name basis in slack? When shared types are involved, it is necessary, but just for method merging it is a sledgehammer solution that becomes a pain for everyone involved.

Plus: it doesn’t solve the usability issue of users who may want to just have two concurrent (and non-clashing) method names concurrently.

And trying not to rehash all of the discussions in Function name conflict: ADL / function merging? and "Meaning", type-piracy, and method merging but…

Change that to


struct Player end

function push!(p::Player)

end

and that is typically innocuous and there is no overlap in the dispatching. push! is a bad function to pun, but frequently it is perfectly possible to have overlapping generic interfaces. Just like in single-dispatch languages where XXX.push!(p) allows punning for any sort of XXX type without any issues.

Until Julia has actual definitions of generic interfaces (i.e. not just agreement on names) and ways to help them coexist, making things purposely inconvenient just makes people look for for crazy workarounds… Leave it up to the users whether they want to merge methods and make multiple generic concepts convenient.

6 Likes

How about this rule to solve this:

Only import a name from a SharedFunctions package if you don’t intend to write generic method with only the Any type dispatch; otherwise have the generic definitions in SharedFunctions

Shared function names with completely generic method definitions should either

  1. have the generic definitions in the SharedFunctions package
  2. not be imported from SharedFunctions

This way, if there is a need for generic methods, the fully generic method is either shared by all who import SharedFunctions or the entire method name needs a new namespace, in case of a “generic method conflict.”

This also makes it easy, if you later decide to add a generic method, you can either drop the import statement (and define locally) or contribute the generic definition to SharedFunctions.

I think you are exaggerating the cost here. I am not sure what you are implying here about the registration process or the Julia community, but I don’t use Slack, and being on a first-name basis with anyone has never been a requirement to registering or updating packages — package registration and updates is a completely transparent and open process.

1 Like

I completely agree with you here. The whole discussion just reflects a flaw in the language that it would be a good time to fix. Your solution with merge_using seems a possible one. Perhaps there are others. Having a sharedfunctions package is basically equivalent to put all these shared names in Base, which was one of the solutions talked about in the long thread everybody mentions.

Please, language designers, step in and solve this problem once and for all!

1 Like

In theory we could almost make it automatic, right? That is, have a github bot that generates a PR that extracts all exported function names, abstract types and abstract docstrings into an abstract header package, and rewires the old package to require and import and extend and reexport functions from the abstract header package.

In theory, the creation of lightweight AbstractFoo / HeaderFoo packages from some Foo package should not requite a lot of human thought or intervention (corner case: functions and types declared and exported by macro that takes environment into account). If this could be done with close to zero work, and if these lightweight interface packages could be enforced to stay lightweight dependencies, then I think this would get a lot more traction.

This doesn’t work at all. It is when the method is defined is when the author needs to decide what function it belongs to. This can currently only be done by extending a function but that is not an inherent limitation. You could for example envision writing something (loosely) like

@extend StatsBase function describe(...)
   function body
end

Which would also “extend” the StatsBase function without having to load it.

Again, there needs to be some way to tell the system what function you are extending and the current way of having to load the package to do so, might be limiting.

The fact that there is no automatic method merging based on name is fundamental to the ability to write generic code.

4 Likes

I think there maybe could be a use case for use-side merging, but there should be no need for it in cases where package authors already know the functions should be merged, and in fact are doing it today. So we should first make it really easy for package authors to specify the merging when they already know it’s the right thing. Then there might be cases that fall through the cracks, and at some point we might need to add use-side merging, but I see that as being farther down the road.

7 Likes

This would be a great step forward. For package writers that would help the immediate proliferation of Base packages which don’t actually share any types. Of course, if they share concrete or abstract types, that is a different story and they need a shared base package.

If you mean “automatic” in that it is done without any choice or control, then I agree with you. But if you mean that it is fundamental to generic programming to only have one “active” concept for any particular function at any particular point in time without namespace disambiguation, then I disagree completely. Other languages have handled that, including both single dispatch languages and generic ones. Leave it to the users to determine if they want to have concurrent generic concepts that involve the same function names - but ensure the package writers don’t worry about that stuff. with a @extend and a @merge using we could have it all.

I think I found a super simple solution.

The key observations are:

  1. Each package has a UUID
  2. isbitstype(UUID) hence a UUID can be used as a type parameter

So, the idea is to define a “universal entry point function”:

module IndirectImports
    struct IndirectFunction{uuid, name} end
end

which can be used to refer to a function in a package without importing it. An example usage is:

module Upstream
    using UUIDs
    using ..IndirectImports: IndirectFunction
    const upstream_uuid = UUID("332e404b-d707-4859-b48f-328b8b3632c0")
    const fun = IndirectFunction{upstream_uuid, :fun}
end # module

module Downstream
    using UUIDs
    using ..IndirectImports: IndirectFunction
    const upstream_uuid = UUID("332e404b-d707-4859-b48f-328b8b3632c0")
    const fun = IndirectFunction{upstream_uuid, :fun}

    struct DownstreamType end
    fun(::DownstreamType) = "hello from Downstream"
end # module

@show Upstream.fun(Downstream.DownstreamType())

where the Downstream package defines a “function” in the Upstream package without importing the Upstream.

(The fact that IndirectFunction{uuid, name}(...) does not return a IndirectFunction is kind of bad but it’s not like this is forbidden…)

Does it work? I feel like I’m missing something as this is so simple. Maybe it is a too much burden on the Julia compiler to manage a possibly huge list of methods for IndirectFunction? Or maybe not?

6 Likes

Cool idea! A few macros would make usage fairly painless.

One issue is that typeof(Upstream.fun) === DataType. This might be a problem with code that expects Function.

Note that there is another layer: functions are in modules, which are then available in packages. It is, of course, a convention in Julia to have the package name and its main module coincide.

Requires.jl is also used to define functionality conditional on having another package loaded, ie if the package is not loaded than some code is ignored entirely. Since this effectively requires introspection of a state (the loader), it would be best to have a syntax and implementation that is part of the language.