Function name conflict: ADL / function merging?

Of course it isn’t the intent :slight_smile: (and please don’t misinterpret my tone as uncharitable… I am a true-believer in julia, and very grateful for all you guys are doing. I am just not able to express things clearly because I am not a programming language expert)

The reason we are bringing this up is that I don’t think it actually was the intent, but it is a consequence. My worry is that the consequence will blow up as Julia, the package ecosystem, and even the standard library expand. Right now, adding something to Base to get it to work is a reasonable workaround, and something that can be taught to confused users, and package maintainers can manually work together to get packages to coexist. But I don’t think it is at all scalable.

See the following

julia> struct MyType end

julia> f(mt::MyType) = 1
f (generic function with 1 method)

julia> size(mt::MyType) = 1
size (generic function with 1 method)

julia> length(mt::MyType) = 1
ERROR: error in method definition: function Base.length must be explicitly imported to be extended

Pretend you are a new user (or even a seasoned developer, as I didn’t understand the issue until this thread). Now, we teach you that in single-dispatch languages you go myval.f() and in julia you go f(myval). So why won’t it let me create the length function? I just want to define a function that has nothing to do with Base and it won’t let me. And what if I define size and later you decide to create size in Base? It breaks my code.

Maybe I misunderstand you, but this is factually false. Nothing about Base is special-cased in method lookup to have precedence.

If that’s true, it implies that fixing the underlying problem would obviate the default using Base. In that situation, how would your code see anything defined in Base?

I don’t know about that — in C you have to include various headers, and in python you have to import numpy, etc. We just decided to import more stuff for you by default, and if you don’t like it use a baremodule. Other languages also “solve” this problem by making things like + magical builtins instead of first-class functions just like any other, which is what we do.

If you try that code in a module, or in a fresh session, it works. You get the error if you’ve already used the length function, since we don’t want to either (1) change what length refers to (it’s a constant, which is important for performance), or (2) let you extend Base.length without indicating that that’s your intent.

Maybe there’s some wiggle room here. For example, maybe we could skip the error in the REPL? The error is far more likely in the REPL, since you’ve probably used length previously in your session, but inside a module you’ve probably just written function definitions and not called length yet.

On another level, is defining types and their methods at the prompt really so important? I suspect people don’t do that very often in python, say, due to the syntax.

C is different because it doesn’t have any single-dispatch or operator overloading.

C++ is as I described it. I think Java is similar to C++. Without doing a survey, I can tell you that every single-dispatch language behaves as I have said. It is just a little easier in a dynamic single-dispatch language to know where to look for functions because if I call myval.f() it only need to look in the namespace/type associated with the myval and not in other places. That is why C++ and other languages need more complicated ADL rules so that f(myval) and f(myval1, myval2) can still work.

For python, of course you have to go import numpy, and you need to qualify with numpy.array to create the object, but after that the language knows where to lookup the functions given the single-dispatch and operator overloading. I won’t say many nice things about python, but that works and you certainly don’t need to mess with a global namespace

There is nothing at all magic about most of these languages! When it encounters a + function it looks at the namespaces of the types of the arguments to look for a + function which matches the arguments. Same when it sees a f(myval) or myval.f(). Nothing built-in or special, just argument-dependent lookups.

2 Likes

Again, this is a symptom of the issue and not the problem itself. Base is both the standard library, and the workaround to make functions easy to lookup without namespace qualifications.

I don’t want to extend Base.length in this case, because my function length has nothing to do with the standard library and I simply want to define a function on my type.

Why not? I think you can, but I do see two barriers:

  1. You could get the both Concept1 and Base export size; uses must be qualified error. That can be worked around by using import, or by my proposal to give explicit using precedence over Base.
  2. You get the same error if you want to use both Concept1.size and Concept2.size. But that, I gather, was the whole point of separating them. If you want the functions to be automatically fused when you use both, maybe they shouldn’t have been separated in the first place? Manual merging could be an option, e.g. @useboth Concept1.size Concept2.size.

Am I missing anything?

Sorry, the “you” was unclear there, and I meant it more literally than “one”. What I meant in that exact circumstance was “you guys writing the Base”. If you decide that in reality it didn’t make sense that the 185 methods for * (before even including libraries!) were 100% consistent, or that when you add in the 186th that these really should be two sets with 100 compatible and and another 86 compatible definitions… What happens? There is no reason that the standard library should expect to have only a single “meaning” for size any more than another huge set of libraries.

So then you go about splitting the namespaces up in Julia v1.X or even Julia v2.0, but everyone has written code expecting the Base to not require any sort of explicit qualification. Plus you have a whole bunch of libraries cramming their * into the Base namespace. Which one should they modify now? With the current practice, it isn’t that easy for you guys to fragment the concepts in Base, which requires even more discipline than normal to ensure generic consistency of every use of every function with the same name in all of the standard library.

Your solution

Some variation of @useboth Concept1.lengthConcept2.length could be a nice workaround to automatically fuse them for the Julia 1.x timeframe. It seems very teachable for people who get confused of why they can’t create a function called length for their own set of types.

Basically, this would make users say that they know what they are doing and don’t care that these may be fundamentally different concepts of length? Much better than the alternatives (e.g. ad-hoc forwarding of functions into Base, putting the functions in Base to begin with for convenience when they don’t really belong there, requiring every package writer to coordinate on shared namespaces for function names if users find clashes, etc.)

I also like that it because it explicitly says what you are trying to do. Given its primary usecase, perhaps it is better to have something like @forceusing MyNM.size which would make sure that my NM.size is merged with whatever the namespace with size is instead of @useboth MyNM.size Base.size? Caveat emptor, of course!

The benefit (or flaw?) of @forceusing wouldn’t require identifying exactly which packages you want to merge.

The Longrun

If you have argument dependent lookup in Julia 2.x timeframe (following the single-dispatch or operator/function overloading lead of other languages), then all of these issues go away… though in fairness it introduces the problem that @ChrisRackauckas brought up with the possibility of someone writing a library to silently hijack an existing type. But considering the alternative is the potential pollution of the Base global namespace it may be worth that risk - as it was in other languages with ADL.

If ADL was added in that timeframe, then the @forceusing would just become a noop.

1 Like

Enough with the method-count shaming! If 185 is too many, what’s the right number, and why? How many function bodies does numpy have related to *? I don’t know, and it’s not easy to find out, but I bet it’s a lot. We have lots of kinds of numbers and matrices, ok?

From 0.6 to 0.7 we have moved over 300 functions out of Base.

Maybe this is not the best example. Does splitting functions happen very often? I don’t recall that being much of a thing. What seems to happen more is we start with separate functions (spread over the package ecosystem), then decide which ones are actually the same and merge them. But if we really needed to split a function, yes people who extend it would have to decide which to extend now. It would be pretty inconvenient, but I think that just reflects the unusual nature of splitting functions.

This might come down to a method-count intuition thing again. You talk about it as if the average Base function with many methods has maybe 80% of its methods for the right reason, and the rest because somebody just “crammed it in there” for convenience. In reality I think the number of methods there for the right reason is much closer to 100%.

7 Likes

That would be nice, along with being able to import and rename, import Foo.very_long_name as vln

1 Like

A bit lengthy but:

import Foo.very_long_name
const vln = Foo.very_long_name

A macro might be possible but I’m not good at those.

2 Likes

https://github.com/fredrikekre/ImportMacros.jl

4 Likes

No way! On Julia v0.7 master binaries after a fresh REPL restart

julia> methods(*)
# 426 methods for generic function "*":

:slight_smile: And that is before you use any packages that cram * into the Base namespace for convenience!

But you are right, enough shaming - and maybe that is counting something different than what I think it is. The point is not that there is a correct number or its distribution, but rather that there is probably some point in the future where there are at least 2 different reasonable “meaning” in your language (or concept in C++ or an - albeit informally defined - generic interface in something like Java) for * in the standard library. And with a generic verb like size, it is very easy to imagine that there multiple reasonable “meanings” that would make sense to coexist in a large enough standard library. If so… then that breaks the idea of using Base as the global namespace, since the assumption there was that a single namespace has a single meaning for every single function name, and in practice Base takes precedence over every other “meaning” in external libraries because of the lack of ADL.


But lets go back to the macro you are proposing. The benefit of this macro is: (1) it would explicitly let people state that they want some sort of “poor mans ADL” for a particular function rather than manipulating Base directly themselves; and (2) it would get me off of your back until discussions for v2.x - at which point I strongly think you should evaluate an ADL based approach. If the latter part isn’t incentive enough, I can ratchet up my obnoxiousness!

Keep in mind that this isn’t especially unsafe, because you still have errors with ambiguity detection, and it just formalizes what people are already doing with adding methods to Base and/or putting forwarding functions to base when they need to manually merge namespaces. Plus you can then more easily grep for a macro name.

If you have reached the point where you understand what some of us are getting at, and are willing to discuss the details of that kind of macro, then should I add in a separate issue on github? What I will say, though, is that I think this has to be formalized in the standard library itself so it is part of the namespace/module/etc. documentation when explaining how lookup works, and how to get around it when necessary. Whether it means people should start putting the * operator in their own namespace and then merge it into Base with the macro is another issue, but I don’t think it is necessary.

I just don’t get your argument. You seem to be playing both sides of the same coin.

You are worried that we may need to split the meaning of some names because they diverge to the point that they no longer are doing the same thing.

At the same time, you are worried that you won’t be able to easily access both meanings at the same time, because you cannot automatically merge those two distinct functions back together automatically.

But if the two functions are now distinct enough to warrant such a split, how in the world could you meaningfully write generic code and not care which one is getting called?

4 Likes

No, then I haven’t made myself clear. Have you looked into how C++, Java, etc. do their namespace lookups (so that they don’t have to privilege the standard library)? This would help make it clear that this is not playing both sides of the coin. The lack of “democracy” is the symptom, not the underlying problem.

There is no magic. Right now you have people putting everything into a single namespace in order to have usable operator *, etc. What I am saying is that this is not scalable. With ADL is there is a separation between the lookup table of functions and the namespaces functions reside in. If you try to bring the same function which operates on the same types into the global space, there is an ambiguity and it won’t work (just as today). Take a look at C++ and/or single-dispatch libraries.

Still not quite getting this. Whose idea is this? I would be fine with something breaking use of Base as THE global namespace, since I don’t advocate that. I believe the claim is that our design unintentionally promotes it, but then breaking it would still be a good thing.

How is it that Base takes precedence? Would that be fixed by letting an explicit using X shadow Base without requiring qualification?

Agreed.

Maybe this is a joke but in case not, do you have examples of this? Standard operators like * and + might be bad examples, since they are so important and widely used everybody has a strong sense of what they mean, unlike a word like fit or match or solve.

As a footnote, I suspect the number of methods of * increased a lot due to merging all the A_mul_B functions into it, which handled various combinations of transposed matrices and vectors.

There is another very interesting wrinkle to this that I don’t think has been discussed much in this thread. ADL or function merging looks pretty good from the “downstream” or consumer perspective: If I say using LinearAlgebra, now I can see * for various structured matrix types. If to that I add using FixedPointNumbers, now I can also see * for fixed point scalars. The whole time, LinearAlgebra.:* and FixedPointNumbers.:* can be totally separate in their separate namespaces. A call to FixedPointNumbers.:* can only see fixed point number definitions, etc. IIUC, this is the desired scenario.

But, LinearAlgebra contains a matrix multiply routine that references *. LinearAlgebra does not, and should not, depend on FixedPointNumbers. So how can we multiply matrices of fixed point numbers? How do we get LinearAlgebra to see the fixed point definitions of *?

If you want an example of packages defining their own basic functions, look at Nemo.jl. It has its own things like its own sqrt so that way it can fully control the behavior. There’s really nothing wrong here.

OK, if we are agreed on this, then that is a first step. You don’t have to agree with me (though I hope you reconsider for v2.0) that this is symptomatic of a more general problem that can be solved with ADL.

I think operators are the perfect examples, because they are essentially unusable with namespace qualifications. I believe the key design of ADL in C++ (and others?) came about to ensure that you operator overloading worked while still having operators in the namespace of the types themselves.

As for an example, look at https://github.com/wbhart/Nemo.jl/blob/master/src/Nemo.jl which @ChrisRackauckas told me to look at. Am I completely misunderstanding it, or doesn’t this insert a bunch of operations into the Base namespace (where the type itself is otherwise nicely contained in the Nemo namespace) so that *, sin, etc. still work? The fact that there are a whole bunch of if VERSION >= v"0.6.0-dev.2024" for the imports seems to prove my point that there is a fragility in library code when Base adds/removes functions. That wouldn’t happen with ADL (unless the changes in the functions introduced an ambiguity).

Agree enough to think about formalizing a macro? I would love to push off suggestions for an ADL redesign for a year or two.

1 Like

I believe that this is exactly the problem ADL is intended to solve. It gives rules for which namespaces to look for functions. Conditioning on that set, I think the rules for checking function ambiguity are the same as what you currently use. There is the issue that a 3rd party library could insert themselves into the middle of the lookup table (as Chris pointed out) but that is rare enough to be controlled with regression tests, and reasonable a price to pay for decentralization of the namespace lookup process.

As I said, though, I don’t think it is worthwhile for me to try to discuss the design of an ADL for Julia. It is premature, and I am not the right person to discuss the details - even if I feel confident in my smell-test to say it is worth exploring. One point I will make, though, is that C++'s ADL is a mess because C++ is a frankenstein with after-the-fact patched on generic programming, friend functions and tight control of encapsulation, etc. With a well-designed langugae from scratch, I think it would be much easier.