How to detect/avoid type piracy?

Yep, I’m getting more and more inclined to this alternative as well.

I think defining size like that is natural, but not necessarily overloading Base.size. It is just a different size function. Of course that comes with the inconvenience of the necessary qualification of the names on usage.

2 Likes

If I recall correctly (I may not), the Base.size of an FFTWPlan is the output and input dimensions of the transformation. In the sense that a FFTWPlan is a linear operator that can be applied via Base.:*, it is operationally like an AbstractMatrix and so extends Base.size analogously.

What I’m arguing is that you can use the name size, but not Base.size. YourModule.size is still available and can totally be used without risk of piracy. Even Main.size (as one gets from naked definition in the REPL) doesn’t conflict with Base.size, although you might accidentally conflict with other Main.size methods or shadow MODULE.size functions imported into the REPL so that their use must be written like MODULE.size.

The point is that Base.size has a specific purpose and it becomes less clear and more error-prone if additional purposes are added. Make your size in a new namespace so that it doesn’t conflict. If laundry day goes so badly that your TeeShirt ends up being multiplied by a Matrix, those size definitions aren’t meaningfully compatible anyway.

3 Likes

I believe (without substantial evidence beyond looking kinda similar) the code in Aqua.jl is just an improved version of my Pirate Hunter code, with some of the TODOs done.

1 Like

Just a question to satisfy my curiosity. The definitions you gave are not enough to explain why you got a problem. You said that Lazy.jl defines

split(::AbstractVector,::Integer)

and you define

split(::AbstractVector,Vector{<:Integer})

they don’t seem to conflict.

On another topic, I do not consider size(::TeeShirt) as pirating if you own the type TeeShirt. There is no such thing as “semantic pirating”, there is only syntactic pirating.

That’s right - I took your Pirate hunter and PR’d a modification of it to Aqua (in a PR you reviewed, no less! :wink:). My implementation was bad and full of bugs. Then other people has since made it work for real

4 Likes

Point taken. Thanks!

Lazy.jl’s signature is generic (::Vector{T}, ::Any):

function Base.split(xs::Vector{T}, x; keep = true) where T

On another topic, I do not consider size(::TeeShirt) as pirating if you own the type TeeShirt . There is no such thing as “semantic pirating”, there is only syntactic pirating.

So, would you overload Base.size()?
People in this thread seem to consider it a bad coding practice, though not a type piracy.

This is a good correction, but the question remains how this method took priority over your method:

I’d expect a method ambiguity error like this:

julia> bar(::Vector{T}, ::Any) where T = 1
bar (generic function with 1 method)

julia> bar(::AbstractVector, ::Integer) = 2
bar (generic function with 2 methods)

julia> bar([1], 2)
ERROR: MethodError: bar(::Vector{Int64}, ::Int64) is ambiguous.
...

EDIT: Now I notice your comment describing your split’s call, which contradicts the method signature you wrote in the original post. Please amend your post and comments accordingly.

1 Like

Lazy.jl’s signature is generic (::Vector{T}, ::Any):

There is a lesson from this: if you are going to do type piracy, do the minimal amount needed: it is really a big mistake to have defined

split(::Vector{T}, ::Any)

when

split(::AbstractVector{T}, ::Integer)

would have been more general and caused less conflicts.

On a second thought, may I linger on this discussion a bit longer?
Not for the sake of argument, but to fully understand your point.

What I’m arguing is that you can use the name size , but not Base.size .

And why not Base.size precisely? You mentioned that this would be error prone. What kind of errors are lurking here?

If laundry day goes so badly that your TeeShirt ends up being multiplied by a Matrix , those size definitions aren’t meaningfully compatible anyway.

OK, so what? I’ll get the most common method not found error. Why should sizes be compatible?

Again, I’m being genuinely curious here, not stubborn. And yeah, I truly appreciate the time people taking to elaborate on others questions.

Of course, I can use MyClothes.size syntax. Feels safely encapsulated indeed. And yet for me this seems to defy the beauty of multiple dispatch. Namely, letting function arguments specify the logic for a common behaviour and don’t worry whether this specialisation is defined in MyClothes, LinearAlgebra or HerEyes. Yes, TeeShirts and Arrays are very different objects, and yet they have a common property size.
So, what’s wrong with adding it to the common Base?

I agree with you. People who consider something akin to “semantic pirating” put something in the language which is not there. There is just syntactic pirating.

The argument here is that is not common behavior. It is just something completely different that happens to have the same natural name.

For instance, if it made sense in some way to multiply a matrix by a TeeShirt type*, you would probably need to define Base.size(::TeeShirt) differently from MyClothes.size(::TeeShirt), such that multiple-dispatch works.

*for instance if that was a trace of the shirt and you want to rotate the points.

4 Likes

Base has only 2 methods for split, the first argument of which is an AbstractString, which don’t overlap with either of the conflicting methods; neither act of type piracy alone seemed to break anything, which suggests assumptions of whether certain split methods exist or not don’t come up often if at all. Neither OP nor the contributor to Lazy.jl knew the other planned to commit type piracy, so they couldn’t have known how to entirely avoid all conflicts with each other. In this case it’s impossible because the inputs for splitting by last indices could easily split by delimiting element instead e.g. split([1, [2], 3], [2]).

For splitting by delimiting element, the only sensible change would be to match the types split(::Vector{T}, ::T), but that doesn’t stop T == Any and forces you to convert inputs to the same type instead of allowing comparisons like 3.0 == 3. There’s not really much to improve on the method signature.

The only way to completely prevent such unpredictable conflicts among libraries is for them to not commit type piracy in the first place, contribute to the original library where agreements are made and tests can be designed.

3 Likes

lmiq made basically the points I would make a few posts up. While it is far from piracy to conflate the semantics of a function so that it serves multiply wildly-divergent purposes, it just isn’t useful. Meanwhile, it exposes more surface area for method ambiguities, accidental piracy, and other undesirables. It also becomes unclear what the result of Base.size means if many packages each define this extremely generic word to mean something different.

Multiple dispatch isn’t about the convenient recycling of name real estate, it’s about allowing objects that represent similar concepts in different ways to be abstracted by behaviors rather than implementation.

I suppose the extreme example would be hypothetical function called most_interesting_operation. This function takes an input and does the most interesting thing it can think of to it. For an array, it takes the multidimensional Fourier transform. For an unsigned integer, it returns how many steps are required to reach 1 using the iteration of the Collatz conjecture. It translates a string into Klingon. And so on. This function can be defined without piracy, method ambiguity, or other issues. Packages could extend it for their own types to do whatever is most interesting for them. But there just isn’t any reason to put these disparate operations under the same roof. The results of this function are utterly incompatible (by practical standards). Exposing these methods via dispatch doesn’t actually help the code be more generic.

And yet the benefit of most_valuable_operation is that it is defined to do something arbitrary, so a user has no expectation of what the result will be or how it might be useful in a generic setting. Base.size already has a clearly documented purpose, so a user has an expectation of what it should be doing and will be left scratching their head if it does something entirely different.

7 Likes

Hmm. Thank you for the nice write-up. I think I begin to understand your point better.

However, your example sets perfect ground for counter argument. Being able to put the most_interesting_operation under the same roof would allow developers to write something like:

map(most_interesting_operation, bag_of_vars)

that would work for any collection of variables whose types might not be even known in advance until run time.

Even better, I can write something like

twice_as_interesting(x) = x |> most_interesting_operation |> most_interesting_operation

And this will just work for all types for which the corresponding specialisation is exported. This is generic!

With the most_interesting_operations confined to their respective modules, the above becomes utterly impossible, isn’t it?

1 Like

My bad. I wrote the post too quickly.
I’d love to correct the initial post, but it seems no longer possible: clicking the edit button just brings up its edit history…
Does discourse lock posts after a while or for some other reasons?

Yes, if that most_interesting_operation is truly useful then it should be one function and utilize dispatch. But Base.size is already defined for one specific purpose and so none of the code that uses it would benefit from such wild variations in meaning.

It’s possible to stretch the boundaries of definitions, but one should mostly only do so when it expands compatibility. My attempt with the TeeShirt example was to discuss a situation where the English language overloading of the word “size” did not warrant the overloading of Base.size. While one can talk about the size of a shirt or the size of an array, and concepts such as bigger or smaller for each individually, there isn’t really a clear way that one should compare or interchange between the two. This incompatibility means it’s unlikely that one would need to know the size in the same context where one couldn’t make the appropriate call to either Base.size or Clothes.size deliberately.

And if one truly did need to glue those uses together, it’s very easy to do so within a particular codebase via a wrapper. But if combined together initially, there’s no taking them apart again.

MyModule.size(x::Any) = Base.size(x)
MyModule.size(x::AbstractClothing) = Clothes.size(x)
3 Likes

I’m concerned that we’re digressing, but can’t help to ponder this point a bit longer.

Yes, if that most_interesting_operation is truly useful then it should be one function

I agree. But who can decide if this something is truly useful. You’ve contrived an example of totally disparate functions and I could find some use for it. I’m not convinced usefulness is that obvious.

My attempt with the TeeShirt example was to discuss a situation where the English language overloading of the word “size” did not warrant the overloading of Base.size .

OK I understand that. But again I’m not convinced.
Consider 3 teams independently developing modules: Furniture, Books and Toys. The concept of size make sense for all of them, also jointly: eg, to know how many items, books and/or toys, fit into a drawer box. So, it seems useful and sensible to be able to write something like map(size, drawer.items) and expect it to just work, isn’t it? (I always derive pleasure when similar things happen in Julia).

Don’t get me wrong, it’s not too difficult to realise the above pattern without overloading the Base.size. But still, why achieving something like that should be made more difficult just because Base.size() was initially conceived for Array like types? This is still not clear to me.

While one can talk about the size of a shirt or the size of an array, and concepts such as bigger or smaller for each individually, there isn’t really a clear way that one should compare or interchange between the two.

Yes, but why should one be bothered by comparison or interchanging issues?
I don’t think that a common interface like size() should imply that the objects are similar, let alone interchangeable. Unless, of course, the common interface opens possibilities for subtle bugs downstream…

I’m aware that we all made our points and are risking to slide into demagoguery. Sorry for that and thanks a lot for the discussion!
(And I thought I understand MD :slight_smile: )

1 Like

The team should create a HouseholdItemsBase package which defines the abstract type HouseHoldItem and outlines (either formally or informally) an interface for them, including share methods for them all to implement.

module HouseholdItemsBase
    abstract type HouseholdItem
    
    "The size, in meters, of the item"
    function size end
end

This helps people know that HouseholdItemsBase.size is distinct from Base.size.

You don’t even need to create the abstract type they all share. You just need to provide a distinct namespace for everything.

6 Likes