Even more clarification on Type piracy

There was a post already about clarifying type piracy. There is also something on the Julia docs, but I would say it is quite small.

So it is best to ask for more.

I have 2 use cases, in which use DifferentialEquations to build my library.

I can see that the first case can be “bad”. step!(integ) is a single argument function used by DiffEq to step an integrator. Inside my module I import step! and define

+function step!(integ, Δt::Real)
+    t = integ.t
+    while integ.t < t + Δt
+        step!(integ)
+    end
+end

Now, this is “type piracy”. I cannot understand why it is a bad idea though, since DiffEq does not have a 2 argument form of step!.

Obviously if such a method is ever defined, I will get a warning message, therefore this cannot “silently” lead to any unintented results. So why is it bad? Doing something else, like defining a different function name, like e.g. step_for_dt goes against Julia’s multiple dispatch mentality.

The second case is the following.
I define a function dimension that does not exist in DiffEq
and then do

dimension(integ::DEIntegrator) = length(integ.u)

Now, I cannot see why this should/could/would be a bad idea.

Anyone care to enlighten me?

3 Likes

My understanding is that the most heavily discouraged scenario is defining a method where you “own” neither the generic function nor the type. So if dimension is defined in your package, this should not be problematic.

Regarding step! (ie extending a function with methods that have a different, previously non-existent signature), I would still be cautious. If you are the only one doing it, it should be fine and not interfere, but if multiple packages decide to do it, you can easily run into the original problem.

7 Likes

There’s already discussions to add this.

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/issues/253

Let’s just finish it with a PR and you’ll be happy.

2 Likes

This seems the most sensible, presuming the originating package is OK with it. In general, might be nice to have a description as to why extending a function is discouraged. My impression is that it reduces maintainability (and might lead to a proliferation of conflicting definitions if eg the original package implements a function that does the same thing but in a different way).

If the package won’t accept a PR, that’s another thing entirely, but I get the impression that that sort of behavior is rare in this community.

1 Like

We merged the PR about 1 hour after this thread response.

At least the Julia circles I work in are quite flexible.

6 Likes

Love that!

2 Likes

As @Tamas_Papp mentioned and also in the second comment of the previous thread (and maybe elsewhere), dimension(integ::DEIntegrator) = length(integ.u) is not type piracy because you are not extending the function. In the document:

“Type piracy” refers to the practice of extending or redefining methods in Base or other packages on types that you have not defined.

https://docs.julialang.org/en/stable/manual/style-guide.html#Avoid-type-piracy-1

So, if you define a function from scratch, it is not type piracy.

There are two kinds of the ways you extend functions defined in other packages. (1) At least one of the argument is dispatched on type you defined. (2) None of the type are defined within your package (as in the case of step!).

From the definition of the document, category (1) is not type piracy. In fact, it is often the part of the interface: http://docs.juliadiffeq.org/latest/basics/integrator.html For example, defining iterator in Julia is done by extending Base.iteratoreltype etc.

The example *(x::Symbol, y::Symbol) = Symbol(x,y) in the document falls in category (2). I believe extending functions from other packages (not just Base) this way should be avoided unless there are very good reasons as mentioned in the document. For example, consider those four packages

  • A
  • B: uses A, extend functions in A via type piracy = category (2)
  • C: uses A, extend functions in A via type piracy = category (2)
  • D: uses B and C

Library B and C relies on the type piracy function internally and don’t know about other’s implementation. Now, package D author come in and think that B and C are useful so that B and C are imported into the package D. At this point, I imagine that B or C may break, depending on the order they are loaded. This is exactly the same as the reason why monkey patch has to be avoided.

5 Likes

Actually, the definition of category (1) is weak and my definition as-is can still be the type piracy when the specification of types in the method signature is not enough.

Assuming you know have a good sense of type 1 and type 2 type piracy, I will try to answer the good and bad in the practice. Type piracy can be bad or should be avoided because it makes code unstable. One may think the code is behaving as it was first conceived, but due to piracy is doing something different. Due to that fact, code becomes dangerous to use or depend on. This is a nightmare in R!

Usually type piracy type 2 can be beneficial if it corrects wrong default behavior. For example, MyStruct should use a different behavior for a standard function because of the implied field or nature. I can then overload the common API to fix that. That is a beneficial aspect of type 2 piracy. Moreover, there are ecosystems that rely and encourage type piracy. For example,

# Module A
abstract type These end
name(obj::These) = error("Method has not been implemente for this struct.")
# Module B
mutable struct MyStruct <: These end
name(obj::MyStruct) = "My Struct Name."

Now people may define <:A.These in any package and the API is consistent. This model is used in JuliaStats (base package StatsBase)

This example isn’t type piracy. It’s defining a dispatch on a type that you own and will not effect the usage of other packages when your type is not in use. This is the encouraged method to avoid type piracy.

3 Likes

It is still type 2 piracy as name is defined in module A and module B is extending its functionality. Module B owns the struct, but not the method.

That doesn’t matter. That’s pretty standard in Julia: adding methods to * for your own number or array type, etc. It’s safe because it still only applies to your code.

Type-piracy is bad because it effects other code that doesn’t use your own types which makes it difficult to understand why using PackageB breaks PackageA. However, if you extend only on your own types this issue can’t occur.

2 Likes
# Module A
abstract type Structs end
general(obj::Structs) = "This is the " * name(obj)
name(obj::Structs) = getfield(obj, :general)
# Module B
mutable struct MyStruct <: Structs
    general::String
    nickname::String
end
A.name(obj::MyStruct) = getfield(obj, :nickname)

I might assume knowing module A that

a = MyStruct("A", "B")
if general(a) == "This is the A"
    println("Expected")
end

but I got "This is the B". Imagine rather than the silly example, I am defining a <: StatsBase.RegressionModel and made it such that the r2 for my struct gives a different definition. If anyone uses the general API and calls r2 on my package models it will give the wrong answer. That can happen a lot in pipelines such as DataFrames, StatsBase, StatsModels, and a package. It might not break a package if I only use PackageA and PackageB, but as soon as there is a PackageC that interacts with those it could break it.

Type piracy could also occur when a package defines a method as x + y and you apply the method to your struct as 2 * x + y. You are basically, high-jacking the method for something else and breaking the API. Once there is a heretic, the API is no longer credible and development can’t use the API to guarantee that it will work with the <: ModuleA.AbstractTypeA. At least that is the definition I rely on for piracy (it might break interaction between packages rather than limit it to between two packages).

@stevengj, I remember you telling that piracy occurs when both you don’t own the method or struct (if not mistaken).
“”"
It is type piracy (this is a technical term in Julia-verse), because this hypothetical package would be extending a function step defined in another package, with the new method acting only on types defined in another package (AbstractVector and Date).
“”“”
So type piracy might be less broadly defined than what I described above.

That’s as expected and still not type-piracy. You just defined a type r2 that doesn’t work correctly, so anyone who uses the r2 from that package will get incorrect results. This doesn’t have to do with package interactions, it’s just the definition and overrides of r2 that are the issue. If other developers avoid your package because they know your r2 is incorrect, then it will not affect their code and they are fine.

I think you’re not understanding what is nefarious about the actual type piracy case. The cases you’re talking about is whether someone should be allowed to use your type in their functions. Should I be allowed to use PackageA’s definition of a Number in the differential equation solvers? Yes, there’s no issue with the extension, and as long as you don’t choose to use PackageA’s number type you aren’t effected. As long as the differential equation solver doesn’t choose to use PackageANumber, having PackageA in scope won’t even do anything.

The actual type piracy case is different. If PackageA defines

Base.:*(a::Float64,b::Float64) = PackageANumber(a*b)

then any time PackageA is in scope, it takes over any code that uses multiplication. You can’t choose to opt in or out of it by choosing what types and functions you’re using: the existence of PackageA will modify the standard way that * works. It also infects everything. If PackageB uses PackageA, then because Base will have changed, using PackageB will change how your code in the REPL works, even if nothing is exported from both PackageA and PackageB.

Another way of saying it is this. The useful definition of type piracy, the one that says you can only extend functions on your own types, means that if PackageA is a type-pirate, adding using PackageA to the module file of PackageB (and doing nothing else) cannot break PackageA’s code. If PackageC is a type-pirate, just adding using PackageC and doing nothing else can break PackageB’s code (that’s without calling any functions or building any types from PackageA). PackageA is safe because it only hurts if you use its extended functionality, PackageC is causes unpredictable changes to any code with which it is in scope (actually, redefining functions in Base from PackageC will break scripts in Main even if PackageC is in scope of PackageA without being exported… gah!). This is why the definition is the way it is.

1 Like

All you need is to have a single type that you “own” in the argument list of a function you are extending, for it not to be considered type piracy (but it can’t just be a type in the keyword arguments, because those aren’t dispatched on [yet])
I have a lot of functions that I extend, where the first argument might be String, and the second one be one that I “own” (*Str or *Chr, for example), but that isn’t type piracy, since only things using my types can possibly be affected.

Minimal example of how to ruin someone’s day with type-piracy.

julia> module A
       Base.:*(a::Float64,b::Float64) = 4
       end
WARNING: Method definition *(Float64, Float64) in module Base at float.jl:379 overwritten in module A at REPL[1]:2.
A

julia> module B
       using A
       end
B

julia> using B

julia> a = 1.0
1.0

julia> b = 2.0
2.0

julia> a*b
4

Now go bury that in somewhere and find out how long it takes for someone to take that WARNING seriously. Note that if you make a definition which doesn’t actually exist with Base types on a Base function, it will do the same thing without a warning. That’s why it’s not allowed.

Necessary: “with great power comes great responsibility.”

3 Likes

One has not experience the endless pain and nightmare of type piracy until after 12+ hours figuring out why the code broke in R, you realize you had sourced a package previously in the session -_- before running the code.

dplyr::intersect
lubridate::intersect

I care about type 2 piracy still more due to the base/API aspects than most normal users or developers in other roles. The idea of piracy is that whatever the method is, it is consistent when applied to any struct as it was defined at conception. If you want to express something different, define a different method. The “benefit” of what I consider good piracy is when the same concept is expanded to new structs conserving the same meaning (usually a type 2 piracy with the no-override except no method defined flavor).

Sure, I respect that. If you’re going to extend someone’s functions or types, it should keep to the same idea or structure. This isn’t type piracy, it’s punning off of some existing construct in a way that it was never intended. Using + for some non-commutative operation instead of * would be confusing.

The reason it’s not called type piracy is that it cannot cause the issue I showed above. Unless you commit type piracy somewhere, there’s no way to force me to use your code even if I did using PackageA. I still have to consent to use your functions or types. Type piracy takes away that consent. Without type piracy, if I don’t use types defined in your package then + is still commutative. With type piracy, you can make + for every other type non-commutative.

One of the things we specifically look for at METADATA reviews is type-piracy for this reason. We allow people to write bad code, but we do not allow their bad code to infect other people’s code. The meaning of type piracy is kept strict to “need to either own the functions or types” because it’s that combination which is required for this kind of infection to occur. Any less strict definition of type piracy is just mixing “writing bad code” with “writing infectious code” which dulls the meaning and dulls the severity of the issue.

Just call it a “bad pun” or something like that so it’s not confused with actual type piracy. Otherwise you’re extending the original idea of type piracy to include a case it was never intended to. The resulting confusion is irony at its best.

5 Likes

I wouldn’t want to pirate the type piracy concept. I guess I blurred the lines when trying to argue the potential benefits of piracy in very few selected cases (sometimes one does it just to avoid having to re-write everything or because moduleA and moduleB ain’t playing very well together such as dtplyr which fixes dplyr/data.table compatibility). If you mess with method and struct you don’t own: type 1. If you own one of the two, type 2? I think the type 1 and type 2 made it more confusing as in the good old days of just type 1 and was really well-defined.

@Nosferican By “type 1” and “type 2” are you referring to the two categories I mentioned? If so, as I said, “type 1” (= At least one of the argument is dispatched on type you defined) was meant to be an example of non-type piracy but I then realized that the definition was not good enough. But “type 2” (= None of the type are defined within your package) is still type piracy and I don’t think there is ambiguity here. I should have mentioned a clear non-type piracy method extension: all of the arguments are dispatched on type you defined.

I didn’t try to pirate the type piracy concept. I explicitly introduced those two categories as “the ways you extend functions”, not two types of type piracy. There are good, expected and encouraged ways for extending functions.

2 Likes