Yep, I’m getting more and more inclined to this alternative as well.
I think defining size like that is natural, but not necessarily overloading Base.size
. It is just a different size
function. Of course that comes with the inconvenience of the necessary qualification of the names on usage.
If I recall correctly (I may not), the Base.size
of an FFTWPlan
is the output and input dimensions of the transformation. In the sense that a FFTWPlan
is a linear operator that can be applied via Base.:*
, it is operationally like an AbstractMatrix
and so extends Base.size
analogously.
What I’m arguing is that you can use the name size
, but not Base.size
. YourModule.size
is still available and can totally be used without risk of piracy. Even Main.size
(as one gets from naked definition in the REPL) doesn’t conflict with Base.size
, although you might accidentally conflict with other Main.size
methods or shadow MODULE.size
functions imported into the REPL so that their use must be written like MODULE.size
.
The point is that Base.size
has a specific purpose and it becomes less clear and more error-prone if additional purposes are added. Make your size
in a new namespace so that it doesn’t conflict. If laundry day goes so badly that your TeeShirt
ends up being multiplied by a Matrix
, those size
definitions aren’t meaningfully compatible anyway.
I believe (without substantial evidence beyond looking kinda similar) the code in Aqua.jl is just an improved version of my Pirate Hunter code, with some of the TODOs done.
Just a question to satisfy my curiosity. The definitions you gave are not enough to explain why you got a problem. You said that Lazy.jl
defines
split(::AbstractVector,::Integer)
and you define
split(::AbstractVector,Vector{<:Integer})
they don’t seem to conflict.
On another topic, I do not consider size(::TeeShirt)
as pirating if you own the type TeeShirt
. There is no such thing as “semantic pirating”, there is only syntactic pirating.
That’s right - I took your Pirate hunter and PR’d a modification of it to Aqua (in a PR you reviewed, no less! ). My implementation was bad and full of bugs. Then other people has since made it work for real
Point taken. Thanks!
Lazy.jl
’s signature is generic (::Vector{T}, ::Any)
:
function Base.split(xs::Vector{T}, x; keep = true) where T
On another topic, I do not consider
size(::TeeShirt)
as pirating if you own the typeTeeShirt
. There is no such thing as “semantic pirating”, there is only syntactic pirating.
So, would you overload Base.size()
?
People in this thread seem to consider it a bad coding practice, though not a type piracy.
This is a good correction, but the question remains how this method took priority over your method:
I’d expect a method ambiguity error like this:
julia> bar(::Vector{T}, ::Any) where T = 1
bar (generic function with 1 method)
julia> bar(::AbstractVector, ::Integer) = 2
bar (generic function with 2 methods)
julia> bar([1], 2)
ERROR: MethodError: bar(::Vector{Int64}, ::Int64) is ambiguous.
...
EDIT: Now I notice your comment describing your split
’s call, which contradicts the method signature you wrote in the original post. Please amend your post and comments accordingly.
Lazy.jl
’s signature is generic(::Vector{T}, ::Any)
:
There is a lesson from this: if you are going to do type piracy, do the minimal amount needed: it is really a big mistake to have defined
split(::Vector{T}, ::Any)
when
split(::AbstractVector{T}, ::Integer)
would have been more general and caused less conflicts.
On a second thought, may I linger on this discussion a bit longer?
Not for the sake of argument, but to fully understand your point.
What I’m arguing is that you can use the name
size
, but notBase.size
.
And why not Base.size
precisely? You mentioned that this would be error prone. What kind of errors are lurking here?
If laundry day goes so badly that your
TeeShirt
ends up being multiplied by aMatrix
, thosesize
definitions aren’t meaningfully compatible anyway.
OK, so what? I’ll get the most common method not found error. Why should sizes
be compatible?
Again, I’m being genuinely curious here, not stubborn. And yeah, I truly appreciate the time people taking to elaborate on others questions.
Of course, I can use MyClothes.size
syntax. Feels safely encapsulated indeed. And yet for me this seems to defy the beauty of multiple dispatch. Namely, letting function arguments specify the logic for a common behaviour and don’t worry whether this specialisation is defined in MyClothes
, LinearAlgebra
or HerEyes
. Yes, TeeShirts
and Arrays
are very different objects, and yet they have a common property size
.
So, what’s wrong with adding it to the common Base
?
I agree with you. People who consider something akin to “semantic pirating” put something in the language which is not there. There is just syntactic pirating.
The argument here is that is not common behavior. It is just something completely different that happens to have the same natural name.
For instance, if it made sense in some way to multiply a matrix by a TeeShirt
type*, you would probably need to define Base.size(::TeeShirt)
differently from MyClothes.size(::TeeShirt)
, such that multiple-dispatch works.
*for instance if that was a trace of the shirt and you want to rotate the points.
Base has only 2 methods for split
, the first argument of which is an AbstractString
, which don’t overlap with either of the conflicting methods; neither act of type piracy alone seemed to break anything, which suggests assumptions of whether certain split
methods exist or not don’t come up often if at all. Neither OP nor the contributor to Lazy.jl knew the other planned to commit type piracy, so they couldn’t have known how to entirely avoid all conflicts with each other. In this case it’s impossible because the inputs for splitting by last indices could easily split by delimiting element instead e.g. split([1, [2], 3], [2])
.
For splitting by delimiting element, the only sensible change would be to match the types split(::Vector{T}, ::T)
, but that doesn’t stop T == Any
and forces you to convert inputs to the same type instead of allowing comparisons like 3.0 == 3
. There’s not really much to improve on the method signature.
The only way to completely prevent such unpredictable conflicts among libraries is for them to not commit type piracy in the first place, contribute to the original library where agreements are made and tests can be designed.
lmiq made basically the points I would make a few posts up. While it is far from piracy to conflate the semantics of a function so that it serves multiply wildly-divergent purposes, it just isn’t useful. Meanwhile, it exposes more surface area for method ambiguities, accidental piracy, and other undesirables. It also becomes unclear what the result of Base.size
means if many packages each define this extremely generic word to mean something different.
Multiple dispatch isn’t about the convenient recycling of name real estate, it’s about allowing objects that represent similar concepts in different ways to be abstracted by behaviors rather than implementation.
I suppose the extreme example would be hypothetical function called most_interesting_operation
. This function takes an input and does the most interesting thing it can think of to it. For an array, it takes the multidimensional Fourier transform. For an unsigned integer, it returns how many steps are required to reach 1 using the iteration of the Collatz conjecture. It translates a string into Klingon. And so on. This function can be defined without piracy, method ambiguity, or other issues. Packages could extend it for their own types to do whatever is most interesting for them. But there just isn’t any reason to put these disparate operations under the same roof. The results of this function are utterly incompatible (by practical standards). Exposing these methods via dispatch doesn’t actually help the code be more generic.
And yet the benefit of most_valuable_operation
is that it is defined to do something arbitrary, so a user has no expectation of what the result will be or how it might be useful in a generic setting. Base.size
already has a clearly documented purpose, so a user has an expectation of what it should be doing and will be left scratching their head if it does something entirely different.
Hmm. Thank you for the nice write-up. I think I begin to understand your point better.
However, your example sets perfect ground for counter argument. Being able to put the most_interesting_operation
under the same roof would allow developers to write something like:
map(most_interesting_operation, bag_of_vars)
that would work for any collection of variables whose types might not be even known in advance until run time.
Even better, I can write something like
twice_as_interesting(x) = x |> most_interesting_operation |> most_interesting_operation
And this will just work for all types for which the corresponding specialisation is exported. This is generic!
With the most_interesting_operation
s confined to their respective modules, the above becomes utterly impossible, isn’t it?
My bad. I wrote the post too quickly.
I’d love to correct the initial post, but it seems no longer possible: clicking the edit button just brings up its edit history…
Does discourse lock posts after a while or for some other reasons?
Yes, if that most_interesting_operation
is truly useful then it should be one function and utilize dispatch. But Base.size
is already defined for one specific purpose and so none of the code that uses it would benefit from such wild variations in meaning.
It’s possible to stretch the boundaries of definitions, but one should mostly only do so when it expands compatibility. My attempt with the TeeShirt
example was to discuss a situation where the English language overloading of the word “size” did not warrant the overloading of Base.size
. While one can talk about the size of a shirt or the size of an array, and concepts such as bigger or smaller for each individually, there isn’t really a clear way that one should compare or interchange between the two. This incompatibility means it’s unlikely that one would need to know the size in the same context where one couldn’t make the appropriate call to either Base.size
or Clothes.size
deliberately.
And if one truly did need to glue those uses together, it’s very easy to do so within a particular codebase via a wrapper. But if combined together initially, there’s no taking them apart again.
MyModule.size(x::Any) = Base.size(x)
MyModule.size(x::AbstractClothing) = Clothes.size(x)
I’m concerned that we’re digressing, but can’t help to ponder this point a bit longer.
Yes, if that
most_interesting_operation
is truly useful then it should be one function
I agree. But who can decide if this something is truly useful. You’ve contrived an example of totally disparate functions and I could find some use for it. I’m not convinced usefulness is that obvious.
My attempt with the
TeeShirt
example was to discuss a situation where the English language overloading of the word “size” did not warrant the overloading ofBase.size
.
OK I understand that. But again I’m not convinced.
Consider 3 teams independently developing modules: Furniture
, Books
and Toys
. The concept of size make sense for all of them, also jointly: eg, to know how many items, books and/or toys, fit into a drawer box. So, it seems useful and sensible to be able to write something like map(size, drawer.items)
and expect it to just work, isn’t it? (I always derive pleasure when similar things happen in Julia).
Don’t get me wrong, it’s not too difficult to realise the above pattern without overloading the Base.size
. But still, why achieving something like that should be made more difficult just because Base.size()
was initially conceived for Array like types? This is still not clear to me.
While one can talk about the size of a shirt or the size of an array, and concepts such as bigger or smaller for each individually, there isn’t really a clear way that one should compare or interchange between the two.
Yes, but why should one be bothered by comparison or interchanging issues?
I don’t think that a common interface like size()
should imply that the objects are similar, let alone interchangeable. Unless, of course, the common interface opens possibilities for subtle bugs downstream…
I’m aware that we all made our points and are risking to slide into demagoguery. Sorry for that and thanks a lot for the discussion!
(And I thought I understand MD )
The team should create a HouseholdItemsBase
package which defines the abstract type HouseHoldItem
and outlines (either formally or informally) an interface for them, including share methods for them all to implement.
module HouseholdItemsBase
abstract type HouseholdItem
"The size, in meters, of the item"
function size end
end
This helps people know that HouseholdItemsBase.size
is distinct from Base.size
.
You don’t even need to create the abstract type
they all share. You just need to provide a distinct namespace for everything.