Is dispatch on another type's parameter (::NotMyType{MyType}) piracy?

This seems to fall under the general rubric that it’s bad to write buggy code? “Avoid type piracy” is general, and “Don’t overload methods of base container types” is specific to Base, and that seems to be deliberate (and has nothing to do with the discussion as to whether that should apply to all Base parametric types).

Type piracy causes “spooky action at a distance”, the other rule is so that users can expect base methods on base types to work correctly across parameters.

I don’t see a workable general rule for what you’re describing, but the example is probably a bad idea nonetheless.

Not sure about 1 word, but if it was documented that obscure_function(::NotMyType) -> Bool, then I would say obscure_function(::NotMyType{MyType}) = nothing deviates from the public API, and changing the public API to accommodate it would be a breaking change.

I agree that the linked style guide principle can apply to types other than Base containers because those aren’t inherently special types; a container isn’t even a formal concept in the type system. However, it would be an overgeneralization to extrapolate to all parametric types. For many parametric types, especially containers, the primary characteristic is the type itself, not its type parameters, and we want a function’s methods to consistently work on that primary characteristic. For example, Vector{Int} isn’t an integer, it’s a vector, and we’d expect it to print like all the other Vectors. You could implement Vector{S} printing differently, but there’s no demand or reason for it. Printing a sequence in other ways reasonably requires a different parametric type, e.g. StepRange enforcing a sequence is entirely determined by start, step, and end points to be worth printing instead of sequential elements.

You could definitely make a parametric type like Type where the primary characteristic is the type parameter, at least where a function is concerned, e.g. convert dispatching on T in Type{T}, never Type. A wrapper type struct NamedNumber{T<:Number} could reasonably be treated as the wrapped T in many functions, like arithmetic (though simple forwarding doesn’t need this). It’s possible for the same parametric type to be treated as the primary characteristic by a different function, like to print the Symbol name with the numerical value.

This also reminds me of another section about warning against dispatching on a container’s element type and other arguments together because it easily runs into ambiguities with methods that dispatch on the container’s parametric type and other arguments, e.g. foo(A::AbstractArray{Int}, b::Int) and foo(A::Array{T}, b::T). The docs encourages something less coupled like foo(A::AbstractArray, b), presumably you’d call another method that works on each element of A along with b, and that method is free to enforce eltype(A) == typeof(b). Again, this respects a container as its primary characteristic.

Ok. But now here’s the prequel:

In an earlier version of their package (I’m using the word package loosly here, will explain in a bit), author A had defined:

is_NotMyType(::NotMyType) = true

and obscure_function did not exist. So in that scenario, we wouldn’t even say Author B’s code is bad. But Author A changed something in their implementation in the next version, without breaking any promises, and is_NotMyType is now broken.

I understand that if there are actual packages there are namespaces as barriers that prevent this sort of thing happening. But I think this counts as type piracy still, no?

The point of all these guidelines is to minimize surprises. Piracy minimizes the “spooky action from a distance” wherein loading what seems to be an unrelated package changes how some code works. And the container type guidance is specifically pointed at changing behaviors that would surprise users:

The trouble is that users will expect a well-known type like Vector() to behave in a certain way, and overly customizing its behavior can make it harder to work with.

It’s not necessarily a hard-and-fast rule, but good guidance.

Similarly, implementing obscure undocumented functions in ways you’re not sure the original author intended — even if not technically piracy — seems prone to surprises all around.

2 Likes

Broken for Author B, but not for Author A, who doesn’t have ownership of nor responsibility for MyType. Any other parameterized NotMyType{T} will do what Author A expects.

Some Author C using both packages might discover the inconsistent behavior, if so it’s more Author B’s problem to fix, should he or she want to. Regardless, this isn’t type piracy, which has a clear definition which this scenario doesn’t meet.

That’s what I mean by a general rule for your scenario being impossible, for the record. Obtaining perfect harmony for everyone’s code is a laudable goal, but there’s no rubric which can make that happen in every general case. Like it’s good if every method of a given function does “the same thing”, defining push!(t::MyType, a) to instead delete a wouldn’t be neighborly, but there’s no good way to define what’s in or out of bounds here. Base methods of push! don’t have side effects, is it ok if user methods do? Just gotta use your best judgement.

2 Likes

Namespaces don’t really help people adhere to public API. It’s also clearer now that your scenario doesn’t necessarily have a nice public API to follow. Like any working relationship, stable package interactions require reasonable promises and adherence to those promises on all sides. It’s not great to be blindsided by bad assumptions or unexpected changes in collaboration; that sort of desperate stumbling ideally happens in early development when code is small and isolated. Even v0 packages that technically allow for breaking changes in minor revisions can and often end up mostly stabilized across many minor revisions by promises.

People who know Julia better than me deleted a few lines of the SortedDict code in DataStructures.jl precisely because my original code had redefined eltype{::Type{T}) for a type T that SortedDict owns. It was explained to me that this sort of type piracy can invalidate methods and cause unneeded excess compilation. See: remove eltype piracy from SortedDict by vtjnash · Pull Request #854 · JuliaCollections/DataStructures.jl · GitHub

2 Likes

Interesting stuff. To summarize here to save people some clicks, these methods in DataStructures.jl/src/sorted_dict.jl (that link goes to master, link to the changes here) were removed for type piracy:

@inline Base.eltype(::Type{SortedDict{K,D,Ord}}) where {K,D,Ord <: Ordering} = Pair{K,D}
@inline Base.keytype(::Type{SortedDict{K,D,Ord}}) where {K,D,Ord <: Ordering} = K
@inline Base.valtype(::Type{SortedDict{K,D,Ord}}) where {K,D,Ord <: Ordering} = D

with what I presume to be these methods in julia/base/abstractdict.jl:

function eltype(::Type{<:AbstractDict{K,V}}) where {K,V} ...
keytype(::Type{<:AbstractDict{K}}) where {K} = K
valtype(::Type{<:AbstractDict{<:Any,V}}) where {V} = V

I omitted the body of eltype there for brevity, but it’s basically an if-statement checking if K and V were provided (iterated unions may not) to see what parameters to specify for Pair, and the branch would occur at compile-time. Those AbstractDict methods would already work for SortedDict in sorted_dict.jl due to subtyping:

mutable struct SortedDict{K, D, Ord <: Ordering} <: AbstractDict{K,D} ...

The comments from the pull request explaining the type piracy:

  1. The meaning of eltype is defined for AbstractDict already, which this declaring itself a subtype of, then pirating the definitions for it
  2. You could own some of Type{<:YourObject} , but you don’t own any of Type{<:AbstractDict} , since that is already owned by the interface for AbstractDict.
  3. I think one issue is that all code that has previously inferred eltype(::Type{<:AbstractDict{S,T}}) needs to recompile since this method could have returned something different.
  4. Correct. You are force it to inject an inference barrier here, so that the compiler can handle the piracy. It knows how to keep working, even though it will cost users performance after it is pushed off the fast path.
  5. A reason to consider that is that defining new eltype etc. methods may in principle invalidate other code; if you would have gotten the same answer without defining those methods, you’re far, far better off deleting them.

As for me, I don’t really understand this. I do see that the AbstractDict methods are already enough (and better because eltype also works on iterated unions) and that DataStructures.jl owned ::Type{SortedDict{K,D,Ord}}, not ::Type{<:AbstractDict{K,V}}. I don’t see how this is type piracy or how this would recompile anything:

  1. eltype(SortedDict(...)) isn’t ever called between the definitions for SortedDict type and the previous eltype method, so the first such call would compile that method.
  2. eltype(T) for other AbstractDict input types T specialize the method for each T because of the method’s shared type parameters, and as far as I know, none of those specializations would need to be invalidated because of the eltype(SortedDict(...)) call or method definition.

I do love a controversy though, let’s see if this evolves the common understanding of type piracy.

1 Like

I wouldn’t call it piracy either, but those methods would cause invalidations for compiled methods of @nospecialized or type-unstable AbstractDict as Julia would see only one matching method for all its subtypes and just hard-code it. And that assumption would become wrong (for no good reason) upon loading the package.

1 Like

Just to clarify, do you mean invalidations of caller methods like:

struct BadWrapper d::AbstractDict end
# compiler can't infer a particular input type for eltype
foo(b::BadWrapper) = eltype(typeof(b.d))[]
bar(@nospecialize (D::Type{<:AbstractDict})) = eltype(D)[]

IIRC, this optimization works when the number of applicable methods are <4 or <=4, and it’ll infer all of them to attempt to narrow down the return type of the runtime-dispatched call. Checking the number of methods:

julia> methods(eltype, Tuple{Type{<:AbstractDict{K, V}} where {K, V}})
# 2 methods for generic function "eltype" from Base:
 [1] eltype(::Type{Union{}}, slurp...)
     @ abstractarray.jl:240
 [2] eltype(::Type{<:AbstractDict{K, V}}) where {K, V}
     @ abstractdict.jl:482

That first one for Union{} only throws an error so only the 2nd method matters, well under the optimization’s method limit. I can confirm that defining another eltype method for a custom dictionary type adds to that list.

Now that we have a motivating example, always a useful thing, I’d suggest that the problem here is in fact the one addressed by Don’t overload methods of base container types, even though that guideline doesn’t discuss the invalidation issue, or extend to all parametric types in Base.

Both of those strike me as good additions, so maybe the best thing to do here is to update the guideline to be “Don’t overload methods of base parametric types”, and add some explanation of how it creates invalidations, which slows compilation, as well as violating the expectation that subtyping Base types will have predictable behavior.

Coming back to this after a while, but weird things do happen if you overload Base.getproperty for Type{MyStruct}:

struct MyStruct end
MyStruct # no error
Base.getproperty(::Type{MyStruct}, ::Symbol) = true
MyStruct # throws the following error:
# Error showing value of type DataType:
# ERROR: type Bool has no field wrapper
@assert MyStruct.anything # still works

Yeah, you’ve implemented something that completely breaks the type.