How to detect/avoid type piracy?

Ouch, it seems I’ve been bitten by the correctness issue in its following manifestation:

  1. My package overloads the Base.split() for (::AbstractVector, ::Integer).
    Unit tests pass, scripts/REPL work correctly.
  2. However, when using the package in Pluto, I started to get incorrect results, silently, no errors or warning.
  3. Debugging revealed that Pluto quietly loads the Lazy.jl package which also overloads Base.split(::AbstractVector, ::Any) which ended up being used instead of my method.

A type piracy has been committed. Unintended and undetected.

Isn’t this scary? A package developer has no control over the combination of packages his package will be used in. No amount of testing would help here.

So, what can be done?

1 Like

Aqua.jl can spot type piracy: Type piracy · Aqua.jl

15 Likes

I just downloaded Pluto and checked the manifest. There does not seem to be any dependency on Lazy.jl . This makes sense, because Lazy.jl is an unmaintained package (it’s last commit was 4 years ago).

So I think the solution is

  1. Don’t overload methods on types you don’t own. That is, don’t commit type piracy.
  2. Don’t use Lazy.jl, or other packages which commit type piracy. It’s not Pluto that’s causing a dependency on Lazy.jl, though. It’s something else.
6 Likes

Odd, if Pluto loads Lazy then I’d expect that to occur before the user can load anything. Do you import anything after your package?

Thanks for the advice!

  1. Don’t overload methods on types you don’t own. That is, don’t commit type piracy.

Yep. Just checking that an existing function like Base.split(), has no specialisation for some type ::T (e.g., ::AbstractVector in my case), isn’t enough.

  1. Don’t use Lazy.jl, or other packages which commit type piracy.

But how to ensure it? Get every package checked buy Aqua.jl?

It’s not Pluto that’s causing a dependency on Lazy.jl, though. It’s something else

Oh, my apology to Pluto.jl then for jumping to a conclusion here.

This is exactly the definition of type piracy. You don’t own the type AbstractVector (or Integer as mentioned in your first post), nor the function Base.split. So defining a method of that function with those types is type piracy.

You need to either define a new function (not a new method of a function in Base!), or a new type (ie a subtype of AbstractVector) that you can define a method for.

It shouldn’t be surprising that defining a new method for a function you don’t own, with types you don’t own, causes unexpected behaviour.

8 Likes

My bad: turns out it is not Pluto.jl that loads Lazy.jl but PlotlyJS.jl via Blink.jl:

PlotlyJS v0.18.13
  ├─ Blink v0.12.9
  │  ├─ Pkg v1.10.0 (*)
  │  ├─ Mux v1.0.2
  │  │  ├─ Pkg v1.10.0 (*)
  │  │  ├─ HTTP v1.10.8 (*)
  │  │  ├─ Hiccup v0.2.2
  │  │  │  └─ MacroTools v0.5.13 (*)
  │  │  ├─ AssetRegistry v0.1.0 (*)
  │  │  └─ MbedTLS v1.1.9 (*)
  │  ├─ HTTP v1.10.8 (*)
  │  ├─ JSExpr v0.5.4 (*)
  │  ├─ Lazy v0.15.1 
  │  │  └─ MacroTools v0.5.13 (*)
  │  ├─ JSON v0.21.4 (*)
  .
  .
  .

I agree. It may be not surprising in hindsight. But this “mistake” is very easy to make: both myself and the author of Lazy package, being carried away by Julia’s power of generic programming, committed the piracy with the intention to extend the Base function in a generic way, i.e., using an informal abstract interface like AbstractVector.

My concern is that this mistake occurs in silence. Shouldn’t be there a safe guarding mechanism that would warn about such ambiguities like methods clash?

2 Likes

There are some mitigating circumstances if the pirated definition is very natural and as such is essentially the only way to define something (think of Robin Hood, he wasn’t really a pirate :wink:).

The following is how the Lazy definition works:

julia> v = [8, 8, 4, 1, 8, 5, 9, 2, 8, 3, 10, 5, 1, 6, 2, 1, 7, 5, 1, 7];

julia> split(v,5)
4-element Vector{Vector{Int64}}:
 [8, 8, 4, 1, 8]
 [9, 2, 8, 3, 10]
 [1, 6, 2, 1, 7]
 [1, 7]

How does the definition in your package work?

The second argument in my implementation is a set of indices to perform splits at:

julia> split([1,2,3,4,5,6,7,8,9, 10], [3, 6, 8])
4-element Vector{Vector{Int64}}:
 [1, 2, 3]
 [4, 5, 6]
 [7, 8]
 [9, 10]

Base.split is only defined and used for strings. Another piece of advice is to never specialize a method if you’re making it serve some other purpose. For example, you should not overload Base.size(::TeeShirt) for your custom TeeShirt type to return shirt size, because Base.size is about the dimension of arrays.

Instead, only overload a method if you need code written by other people to operate on your custom type in a custom way (either because the existing version would not work or would be inefficient). For example, it’s common to extend Base.getindex to new matrix types because you need them to be usable like matrices in other code (e.g., in matrix multiplication code).

In this case, since nothing else exists that already provides this functionality (Iterators.partition is the closest I can think of), I would just rename your function to something else that is entirely new.

8 Likes

Also look at Home · ChunkSplitters.jl which might already have the functionality you want.

Or, the following might help:

julia> indexchunks(v, idxs) = 
  (i==0 ? (firstindex(v):idxs[1]) : 
   i==length(idxs) ? (idxs[i]+1:lastindex(v)) :
   (idxs[i]+1:idxs[i+1]) for i in 0:length(idxs))
indexchunks (generic function with 1 method)

julia> [[1:10...][chnk] for chnk in indexchunks(1:10, [3,6,8])]
4-element Vector{Vector{Int64}}:
 [1, 2, 3]
 [4, 5, 6]
 [7, 8]
 [9, 10]
1 Like

Pirate Hunter has a script to help find type piracy, though I’m not sure if it needs to be updated

Maybe having some specific Aqua tests could be requirement for auto-merging a package in the registry?

to me that sounds reasonable at a high-level, but I worry about the details. E.g. some packages are essentially allowed to pirate methods from another, since they are designed together to do so. Or packages like GenericLinearAlgebra that add linear algebra methods for bigfloats etc are piracy, but basically desired. So given there is some amount of desired piracy in the ecosystem, we wouldn’t want to necessarily require manual merges for every version of such packages. But then it gets more complicated to make an allowlist etc.

5 Likes

Thanks for the pointer! Seems to be a useful package.

The code is in essence my implementation of Base.split().
The consensus here seems to be that it would be better served by a different name.

Another alternative is to not add your method to Base.split, but instead keep it local to your pkg.

julia> module MyPkg
split(::AbstractVector, ::Any) = ...
end

julia> using MyPkg

julia> MyPkg.split(...)
2 Likes

I wish there were a more explicit way to do this and report downstream piracy (privateering?) conflicts.

This is related to the traits issue highlighted in the viral instruction post (“what’s bad about Julia.”). Julia facilitates polymorphism through the workflow: (1) subtype some abstract type from another package (e.g. “AbstractArray”); (2) implement a list of methods for your new subtype (e.g. Base.size, Base.iterate, etc…). However, it is currently not at all clear what methods must be implemented if the downstream package’s struct is to work with other functions defined on the struct’s abstract supertype. There should be a way to declare method implementation requirements for subtypes of an abstract type (this would resemble the Holy Traits pattern) and it should be quarriable (e.g. with a function Base.methodrequirements(my_type)::Vector{Function}); packages that implement a subtype of a supertype with declared method requirements would throw a warning upon loading if the package does not define a required method. Similarly, there should be a way to declare functions that may be pirated while reporting duplicated piracy in downstream packages.

I believe Interfaces.jl is working toward this goal.

1 Like

Hm, is it so? In my environment, Base.size counts 277 methods. I don’t think all of them are array related. E.g.,

looks close to the TeeShirt counter example.

One of my favourite things about multiple dispatch was the freedom to choose those name for my function which other people are used to and are more likely to understand without help instructions. On this ground, I’d precisely overload Base.size(::TeeShirt) for my `TeeShirt’s, since the name is natural.

I was under impression that by allowing for that Julia community was creating a programming ecosystem governed rather by natural than a programming language.

Is my thinking completely wrong?