How to detect/avoid type piracy?

This is precisely why I talk about the “meaning” of a function so often. You can only understand code if you know what it means. And what Base.size is defined to mean is the shape of an array. You’re welcome to introduce another size function in another namespace precisely so it can mean something entirely different. And if you’re explicit about it, everyone will be able to understand you.

map(size, drawer.items) is a great example — what if the drawer contains a box of socks? Do you return the number of items in the box or the dimensions of the box?

9 Likes

I had a long response, but mbauman said what I did with a short one so I scrapped it.

Attaching additional meanings to an already-defined function is called “punning”. If your version doesn’t do “the same thing” (which requires some amount of judgement, but usually is clear enough in practice) then extending an existing method is a pun and should be reconsidered. And if multiple people pun the same function, should you really expect them to pun it the same way? Punning creates chaos, rather than diminishing it.

As a matter of practice, if your map(size, drawer.items) returns [(30, 20, 15), 206, "jumbo", 2.54], how do you intend to use it? It looks difficult to use, to me.

But if you find yourself in a situation where this is useful, then a simple wrapper like

bigness(x::Number) = Base.abs(x)
bigness(x::AbstractArray) = Base.size(x)
bigness(x::AbstractClothing) = Clothes.size(x)
bigness(x::AbstractFurniture) = Furniture.dimensions(x)
bigness(x::AbstractAnimal) = Animals.weight(x)

is a good way to collect that functionality for your specific use.

7 Likes

The example is about 3 different independent teams which do not collaborate.

Thus it falls onto user to create the HouseHoldItemsBase module to glue everything together, and maintain it. Possible, but not that convenient.

map(size, drawer.items) is a great example — what if the drawer contains a box of socks? Do you return the number of items in the box or the dimensions of the box?

In this case, dimensions of the box seem rather natural. size(box) shouldn’t really look inside the box. Documentation should be able to resolve doubts, if any.

Introducing a separate namespace to be able to dispatch on different sizeable items from totally different packages does require some work. I was hoping that adhering to Base.size we’d get this polymorphism for free, out of box.
Am I lazy?

Yep, that’s good point and good way out of my dilemma. Thank you @mikmoore and @mbauman and everyone else for the discussion!

I’d like bring another perspective, by highlighting how the names were very well chosen:

  • Function, as in, “what’s the purpose?” or “what it’s supposed to do?” (what’s its function?)
  • and method, as in “how to do it?” (= what’s the method?)

Because Julia tends to hide namespaces through the using keyword (at least for those who’re not used to work with namespaces from other languages), this blurs the picture (but this is a big topic in itself, and good progress is made with the public keyword). But the whole spirit of this feature lies here.

In that light, how we should think of all of that falls in place quite naturally: we have to consider any function in conjunction with is module AKA namespace AKA context. Omitting the module’s name is just a shortcut.

Just to illustrate with an example, in the context of a fashion package, size may relates to how big things are, but, in the context of base, we want the size of “computer stuffs”, i.e. arrays or similar.

In the end, I personally found it quite logical & natural that, in a given context (module), a function must have only one purpose; and that conversely, in general, a same function name in different context means different purposes.
… but this is not to say that it’s obvious at first, far from it!

PS: I hope this perspective differentiates enough to be worthwhile

EDIT: typo

6 Likes

Yes but I don’t recall it being this fast, I’ve been able to edit my own posts for as long as a week later.

Back to the topic at hand, I want to differentiate what we’re calling “punning” and type piracy. You can do one without the other; for example, I could make a WeirdCow type and extend Base.:+, but instead of addition it plays a video of a cow eating grass. Punning also varies on how deviant it is; both conflicting split methods deviate from string processing, but Lazy still splits by a delimiting element, so it could polymorphically interchange with Base’s methods in another generic method and would be the more reasonable candidate for a contribution to Base.

I would like to point out that what you call punning is often done in mathematics. So if you want to represent mathematics faithfully you must accept punning. For instance if g,h are elements of a group it is common in mathematics to note g^h for conjugacy inv(h)*g*h. So it is natural for elements of type T of a group to have Base.;^(g::T,h::Integer)to be exponentiation as usual but to have Base.:^(g::T,h::T) to be conjugacy. And there are many, many such examples in mathematics.

3 Likes

Yes, I humbly agree on that.

we have to consider any function in conjunction with is module AKA namespace AKA context . Omitting the module’s name is just a shortcut.

Not quite. The context is often enough not known in advance.
As, I tried to point out earlier many generic patterns like

twice_as_interesting(x) = x |> most_interesting_operation |> most_interesting_operation

won’t work without exporting the function’s name and using multiple dispatch (MD) figure out which method to call. For me a lot of practical use and elegance of the Julia language lies in this very feature.

Btw, I agree on the niceness of the function/method nomenclature.

As people have pointed out, you can make different functions of the same name in different modules, a practice also found in languages without multiple dispatch. You don’t need punning at all to involve multiple contexts, and it’s cleaner to avoid implying polymorphism where there isn’t.

1 Like

you can make different functions of the same name in different modules

You mean, having to qualify the function with the module name when using it?
I certainly would not want to do that when using the symbol ^, it would be too cumbersome

Well, at some point, you have the choice to either:

  • explore & exploit all the nitty-gritty details and possible combinations offered by a given language, which is a very large space plenty of impractical ways of coding
  • limit yourself to a subset of all possibles, by sticking to your own good practices (generally built from your own experience + the community’s)

And the K.I.S.S. approach (which I often remind to apply to myself) is common implementation of the latter.

Here, the example you are suggesting is problematic in that it’s very subjective in nature and goes beyond the purpose of multiple dispatch - rationale that I tried to convey through the semantic analysis - and wouldn’t be used by anyone for any practical purpose (other than to make a point, sorry to say that).

You’ll notice that everything you code is in a given context (in the common meaning, not the “programming” context): a script, a game, a physics problem, an IT infrastructure architecture, a mathematical operation, a web app, etc. In each of that, this is the context that gives meaning to a function (btw, this is not for not reason that the topic of context is hot topic in LLMs).

More generally - and more philosophically - I’d even argue that there is no such thing as a context-free word sens (and thus function).

As I understand it, your logic is that, considering that a function can be dispatched to any method, why not do it? (this type of questioning is generally a good approach when one wants to explore new possibilities)

But instead, I proposed that, given the premises under which the multiple dispatch capability has been thought of, pondered, & implemented (module, then function, then method), it’s better to use this tool that way, namely, considering that a function has an objective meaning only in a given context, and may be implemented by one or multiple methods.

Dropping context (in the sense of what’s defining the purpose of a function - not a method) is totally allowed by Julia syntax and you’re totally free to choose this approach.
But this is at your own risk, by making things much harder to reason about for big codebases, getting confused, and possibly leading to unexpected behaviors, such as the one shown in your initial example.


But to still, trying to go in the direction of the example you’re proposing, I’d say that this would still somewhat work if you implement the function most_interesting_operation with a custom method using your own custom type. I other words, by avoiding type piracy.

In that case, the context would be something like SubjectiveView (which would be the package name first defining the function and possibly some methods), and each new package creator could implement his own most_interesting_operation method of SubjectiveView using his VeryPrecious custom type. In a word: by avoiding type piracy.


So the TL;DR is:
“type piracy” is litteraly litteraly what it means: it’s the hacking of a function’s purpose / intent, and can lead to troubles.

1 Like

Thanks @Barget.
I don’t really disagree but I also don’t fully agree with the view you presented.
A few remarks first.

Your comment addresses both discussions about type piracy and type punning. I believe there is a consensus that the former, i.e., type piracy in its technical meaning, is a vice leading to subtle bugs as in the OP and thus should, must even, be avoided. This threads focus has shifted since to the latter question of type punning being good/bad, or acceptable/or not practice in Julia. So let’s stick to it.

Here, the example you are suggesting is problematic in that it’s very subjective in nature and goes beyond the purpose of multiple dispatch - rationale that I tried to convey through the semantic analysis - and wouldn’t be used by anyone for any practical purpose (other than to make a point, sorry to say that).

I cannot agree with this statement. The example is neither subjective nor impractical. To the contrary: let’s change the name of most_interesting_operation to something less contrived, say, process. Then,

result = x |> process |> process

is a generic pattern of a two-stage processing chain with dynamic context. The chain is well defined by the input/output types via multiple dispatch. Need to accommodate another type x. No problems, just extend the process function defined in some other module. (Possibly pun, because process can mean anything, can’t it? But no pirating!) No need to touch the main code at all! Very practical. For this to work, however, the community should accept type punning in favour of simple, natural function names like process or size.

This however:

In that case, the context would be something like SubjectiveView (which would be the package name first defining the function and possibly some methods ), and each new package creator could implement his own most_interesting_operation method of SubjectiveView using his VeryPrecious custom type. In a word: by avoiding type piracy.

is very impractical because it requires maintenance and coordination.

Now to the main point.
I do not disagree that everything acquires meaning through its context. IMHO, it’s a rather trivial statement and as such of little practical use.
My example regards the case where context cannot be fully known at the time people write their modules. I was under impression that Julia solves this problem of generic programming by allowing function arguments, not Modules, to define its context fully. Indeed, I’m still not convinced that size(::AbstractArray) and size(::TeeShirt) require there respective Base. and Clothing. qualifiers. To me, they don’t cary much additional meaning.

But, as you pointed out, it is my subjective point of view. It met a lot of criticism here which I happily take on board.

You also meet approbation, at least from me.

2 Likes

I related a lot with what @vvbond had to say and I found both sides of the discussion helpful.

I think the consequence of adding a method to a function you don’t own to work on a type you do own (punning without piracy) is that other functions you don’t own will get “activated” for your type. For example:

module MyModule
struct S end
Base.:(+)(::S, ::S) = true
s = S()
@assert s + s
s += s # turns out this now works
@assert s
@assert sum([S(), S()]) # turns out this now works too
end

The question is, in the absence of documentation, what is the expected default?

  1. If you, the author of the code, intend for += and sum to do what they are doing, great.

  2. If you expect that defining + for S makes no promises about how += and sum do with S, great. This seems reasonable at first look. After all, you don’t own += and sum. Base does. Just be sure that the users of your code expect the same.

  3. But if you expect sum or += and who knows what other Base function to throw a MethodError or something similar-- which would be the safe thing to do-- you are out of luck. It is not Base’s responsibility.

I’ve myself been working with expectation 2. But I think I might need to reconsider.

1 Like

Yeah, that was also my main question: are there some unforeseen consequences of type punning that might lead to subtle bugs?

Your example @sadish-d is the first one trying to address it. Thanks!
I understand the idea

that other functions you don’t own will get “activated” for your type.

The example however doesn’t really support it:

  • += is not a function, but syntactic sugar, ie, part of Julia parser.
  • Base.sum(x,y) is a function but it’s just a wrapper around mapreduce with add_sum being the reduction operator which in turn just calls Base.+(x,y).

So, no surprise that those operations become valid upon extending Base.+(::S,::S).
Indeed, it is expected and useful.

Having said that, I do share your concern, and am curious what other pitfalls in similar vein are there.


P.S. Is overloading Base.+ considered type punning as well? Do people really implement their own operators like MyModule.+ without extending Base? It would be very cumbersome to use.

1 Like

I was deliberately using an operator as an example to show that the idea extends to operators too, and is not limited to functions.

I was also deliberately using + as an example to show that there can be very different expectations around it. Someone in their module may choose to use + as add and sum as add everything together. But I might choose to give different meaning to those symbols. Would it be surprising to you if you encountered the following?

module MyModule
import LLM # an imaginary package with large language models.
           # It has a `summarize` function that generates the summary of a text.

struct Text
	content::String
end

Base.:(+)(t1::Text, t2::Text) = Text(t1.content * t2.content)
Base.sum(t::Text) = LLM.summarize(t)
end

Here I am using + in the sense of append text, but sum in the sense of summarize. That’s just what I think is the best use of the symbols for me and the users of my code. My module, my domain, my choice of semantics. Julia makes it possible.

Maybe you would expect sum to work as append everything. Maybe someone else also expects commutativity? Whether you do 1 + 2 or 2 + 1, you get the same result in arithmatic. But that’s not true in the module I defined. My module makes no such promises at all.

Yep, that would be surprising: I’d expect a :+ operator to be commutative, indeed. Sure, this can lead to logical errors in the user code. Therefore, documentation is paramount in Julia.

But would the situation be any better without punning? Your point is about side effects, right? Namely, that an unexpected property, like non-commutativity, propagates from Base.:+ to Base.sum. My response: this propagation is great! Isn’t it the basis of Julia’s excellent composability? If, however, a side effect like this one is undesirable - simply specialise the corresponding methods, e.g., Base:sum, for your type to disentangle the two behaviours. Actually, you did just that in your example!

There is still the problem of discovery of all possible side effects. This proves to be rather hard. See, for example, this PR on function currying that got stuck for 6+ years now RFC: curry underscore arguments to create anonymous functions by stevengj · Pull Request #24990 · JuliaLang/julia · GitHub.

1 Like

I meant commutative not associative in my earlier response. Thanks for the correction. I’ll edit my response.

Julia’s current behavior might be desirable if I were using + as commutative add. But It’s not always desirable. + is just a symbol that happens to mean add in certain contexts. sum means add everything in some contexts, summary in others.

The situation would be different if I didn’t redefine Base.:(+) and defined a function like MyModule.append instead. Then Base.:(+) and Base.sum would throw errors. And I could define my own MyModule.sum which means summarize instead of add everything.

My point is that the default expectation should not be that everything just works precisely because we can’t go around finding and correcting other functions that uses +.

Thanks for the link to the github issue. It might be too advanced for me.