Multiple dispatch in mathematical writing

Hey,

I have been coding almost exclusively in Julia for a while, and i noticed a strange habit in my mathematical writing that was not there before: I start to use overloading and multiple dispatches everywhere. My reviewers disagreed, and therefore the question tickled me.

Suppose I define a function f : \mathbb R \to \mathbb R, say f(s) = s^2. Then, for any measure \nu \in \mathcal M_+(\mathbb R), I define f(\nu) = \int f(s) \nu(ds).

Furthermore, for a dataset \mathbf x \in \mathbb R^{N}, i use f(\mathbf x) as representing the estimation of the parameter f(\nu) that is done from this dataset.

Then, quite naturally, I define a quadratic loss \lVert f(\mathbf x) - f(\nu) \rVert_2^2, and the optimization goal is to find a measure \nu that would minimize this loss…

What do you think about the notation f ? Is it ambiguous ? According to Julia’s dispatch, it is not. Do you do such things when writing math ?


Edit: my example is only overloading, but same thing happends with dispatch : f(x,y) means a completely different function than the ones above…

5 Likes

It’s quite common to overload math notation like this but I think that mathematicians may not realize how overloaded their notation actually is. In your example, it might be more explicit to have a functor that lifts f from \mathbb R to \mathcal M_+(\mathbb R) but it’s also quite common to just use the same name. I think it depends on how much it’s important to think about the functor application as an explicit operation.

9 Likes

There’s a pretty common function called “+” that is overloaded like crazy in mathematical writing…

18 Likes

Indeed, very good point. I need to build up my case :wink:

1 Like

Right, + is always a great example. Sometimes people will distinguish between + in different objects or categories—often with a subscript or superscript—but usually they just write + everywhere and let which + they want to use be implied by the context of what kinds of objects are being added. Multiple dispatch isn’t the only way to implement that kind of overloading in a programming language, but it certainly does seem to be a good fit, so I think your approach is well justified.

3 Likes

This convention is common in Physics. Consider a field variable \phi which could be written as a function of Cartesian coordinates, i.e. \phi(x,y,z), or spherical coordinates, i.e. \phi(r,\varphi,\theta).

2 Likes

In math overloading if often called abuse of notation. Like in programming this can drastically simplify or obscure things.

6 Likes

I’m a professional math guy and have considrable experience in editorial work. Overloading like this can confuse readers and referees (bad) and get your paper needlessly rejected. I think your particular example is reasonable, but I can understand a referee’s becoming annoyed at your quadratic loss definition and the different meanings for the symbol f . Telling the referee that it’s multiple dispatch will not make things go better, nor is saying that it’s just like +.

I like it when my papers and proposals are accepted and avoid doing things like this. @jw3126 has it right.

6 Likes

Thanks to be so direct, that is exactly the core of my question.

I think you have to take it on a case-by-case basis. If the application of f to different kinds of arguments is clearly just a generalization of the same conceptual thing, then used judiciously it’s reasonable and even commonplace.

For example, no one has a problem with the same name exp used for the exponential of both scalars and linear operators, from real numbers to matrices to differential operators on functions.

On the other hand, in the original example f(s) = s^2 (nonlinear) when s is real but f(ν) is a linear operator when ν is a measure, then I would think of these as two totally different functions (even if the latter involves f(s)) and it seems confusing to give them the same name.

8 Likes

I think this is pragmatic advice, but why is + different from f? I genuinely want to know if there’s a reason for these to be treated differently or if it’s just traditional that + is allowed to be so heavily overloaded.

1 Like

History and common usage. + is overloaded in non-mathematical English, unary operations are clear to most people, and @stevengj 's example of exp is standard and all math people would get it. f means what you say it does in your paper and defining to be two different things is poor form. There’s a difference between things like exp, +, \Sigma, which are well understood and notation you invent yourself.

I think the bottom line is that it’s fine to use commonly overloaded things and not fine to make up overloaded symbols yourself. Most referees will get the message when you use exp(A) no matter what A is and still have every right to be unhappy with several definitions of f.

4 Likes

This would seem to rule out matrix functions, which are a standard idea in which any analytic function f: \mathbb{C} \to \mathbb{C} can be generalized to act on square matrices (or many other linear operators), and people just write f(z) and f(A) for an arbitrary “user-defined” f.

1 Like

I think a crucial difference is it’s completely fine to use the same notation for generalizing one notion, in a way that is backward compatible. For instance, matrix functions are defined in a (Banach) algebra, of which numbers and linear operators are instances; similarly + is usually an instance of the group operation of an abelian group. It’s different from eg using psi(x) and psi(k) for a function and its Fourier transform (most physics people will think it’s an eminently reasonable shortcut for simplifying complicated expressions, most math people will never talk to you again). The OP example of f(ν) IMO falls into the second category, it’s better to use a different notation than () for both (and I would argue in julia code as well). An interesting case is notation that could be plausibly interpreted with two meanings, eg M >= 0 for a matrix (is it in the sense of SPD matrices or componentwise)?

8 Likes

That’s pretty standard too and very traditional. Nobody would have problems with that unless you decided to change the definition of a standard function. eg…

Let exp(x) = x^2

1 Like

Ooff. This is one of my least favorite examples. It’s one thing when the various behaviors of a function can be determined by looking at the input values (like + for reals vs vectors). But here it’s just the variable names that distinguish them, once you provide input values, you have no idea what’s going on.

This is probably my very least favorite one. Why not just put a hat on the psi? What does psi(0) mean here?!

4 Likes

Then you disambiguate by writing psi(k=0). As a card-carrying mathematician I was as appalled as you at first, but I have to say it creates a lot less issues than what you’d think, and now I find myself using this kind of shortcuts when writing on the board or in notes (not in papers). In math there’s a lot of emphasis on the fact that mathematical writing should be compilable, but that sometimes leads to register spill issues where you’re forced to use symbols that don’t really match their use, or even out-of-memory errors (the “oh no I’m out of latin and greek, I’ll now use hebrew” syndrome). In physics since you’re using the same name for everything, you can just use a limited number of registers that have a clear purpose, and never run out of them.

Another quirk of the “names don’t matter as long as it’s correct” motto is that you sometimes find definitions and formulas that are physically strange. For instance, the definition of Fourier transforms of distributions usually goes by extending the first formula in Fourier transform - Wikipedia, which is just weird (is x real space or reciprocal space?). (This particular one is just a bad choice of definition imo, much better to introduce conjugates and define it through good old parseval.)

2 Likes

It’s been around for a long time… certainly confusing though, I agree.

Well, it’s not very nice when you start using scaled and shifted coordinates inside the functions.

But most of all, I think it reduces clarity, and makes it really hard to keep track of what is going on, so I think this is even worse for educational purposes than for publishing.

2 Likes

The issue is that f, \tilde{f}, \hat{f}, \tilde{\hat{f}} and \hat{\tilde{f}} all meaning different things make things hard to read too :wink:

There were a lot of interesting arguments here on both sides of the argument. I am trying to find a cleverer notation than the one I had, disambiguating these things.

I still think that multiple dispatch should apply in math papers, but this is a fight I am not willing to take myself…

1 Like