Extending type functionality in Python and Julia

I have been using Julia for the last 9 years. As one of the few (if not the only one) users of Julia in the institute I work, I was asked to give a talk about it. One of the things I want to talk about is the ease with which one can extend type functionality. While resarching this in some depth, I watched again the video by Stefan Karpinski (here), especially the bit on “Sharing common types” where he presents the RGB example.

While I am not planning to contrast Julia to other programming languages in my talk, I feel that inevitably and naturally people will want to compare the sharing of common types to how Python does it.

While I understand the arguments presented in the talk against extending the original class or using inheritance, I think there may be a third option in Python: one could write new functions that extend functionality without putting them in a class definition and pack them in a module. What would speak against this practice? Unfortunately, I am not sufficiently familiar with Python to answer this question myself.

This would work perfectly fine in Python. However,

  1. Then you don’t get dispatch (polymorphism), since this is done in python through methods. Dispatch is fundamental to getting generic algorithms to work, so the more methods you move to be freestanding functions, the less generic code you’re going to get
  2. You might be able to write freestanding methods yourself, but everybody else won’t. They’ll keep using methods. So you still can’t take type T written by author A and plug it into function F written by author B - because author B likely wrote F as a method.
4 Likes

Yes, I think that’s more or less the key point.

You can always mock the behaviour of everything in Python, pass objects to a function and it will work as long as there are no runtime errors with whatever functions, properties and getters/setters etc…

The main point however is, that using a type(d) system, the dispatch happens at compile time (it can be a runtime dispatch though, but that’s another story). This means that you have a set of function arguments – with known types – and you pick the most specialised method of the function at the compilation stage. No runtime risks. The type and dispatch system will basically take care of the compatibility.

Your example with the mocking Python code is doing something different: knowing some interface, build something which behaves like it and you can never be sure what happens during runtime. There is no guarantee that somewhere downstreams there are still some details which you have missed.

2 Likes

Thanks for the clear answer. My source of confusion was me forgetting that Python is “object-oriented single dispatch” (dispatches on object type) as opposed to “functional dispatch” (dispatches on argument types of function) – if these made up terms make sense. Hence, coming back to your points:

  1. There is no dispatch in what I suggested as Python does not dispatch on the type of the passed argument, but on the type of the object that the called method belongs to. Hence, freestanding functions cannot offer any dispatch capabilities whatsover.
  2. Defining types in Python, means defining classes. Hence, all methods are associated with classes. Freestanding functions can of course receive objects of a particular type as arguments and extend functionality, but as they cannot dispatch on the type of argument they cannot be used for writing generic code.

Sorry for being verbose. I wanted to write things out in detail to avoid subtle misunderstandings.

2 Likes

Thanks for this comment. What you are saying is that, these freestanding methods could at best be intepreted as a loose, unreliable interface. Building on top of it would most likely cause issues down the road. I hope I understood this correctly.

1 Like

Yes, that’s one of the conclusions :slight_smile:

1 Like

Adding a bit of nuance, while many Python libraries do stick to the language-supported single-dispatching over member methods, external functions are just more natural for composability (monkeypatching member methods only goes so far), so dispatch is often implemented in them as well. The simplest strategy is to forward to singly dispatched member methods:

def foo(x, y): return y.foo(x)

and more elaborate library-specific designs exist for limited multiple dispatch.

You can if type T implements the behavior of one of F’s arguments BT().F(T()), and this is fairly common for Python’s composability. F doesn’t have multiple methods within the BT class, and T doesn’t work as the leading object if it itself doesn’t implement an F method. For all of member methods’ perks like member-completion, these limitations don’t exist for multimethods, and that really benefits from language support.

I don’t know if I am late or not, but hope this will be a useful addition to a discussion.

Dispatch in a stand-alone function in Python can only be done via an explicit type check. So, to resolve all the type checks, the user must install all the packages implementing types the function author cared to check for.

Another issue with Python’s freestanding functions is performance. A function taking into account only the interface may be grossly inefficient. For example, here is a generic function to compute a standard deviation for an array:

from math import sqrt

def mean_py(vec):
    total = 0.0
    n = 0
    for x in vec:
        total += x
        n += 1
    return total / n

def stddev_py(vec):
    ave = mean_py(vec)
    disp = 0.0
    n = 0
    for x in vec:
        diff = x - ave
        disp += diff * diff
        n += 1
    return sqrt(disp / (n - 1))

While it works fine for any iterable object, for Numpy arrays another function will be much faster:

import numpy as np

def stddev_numpy(vec):
    n = len(vec)
    ave = np.sum(vec) / n
    diff = vec - ave
    disp = np.dot(diff, diff)
    return sqrt(disp / (n-1))

Now, some performance tests on my computer for both functions:

from random import random

import array

x = [random() for _ in range(100_000)]
x_np = np.array(x)
xa = array.array('d', x)

%timeit stddev_py(x)
# 11.6 ms ± 357 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit stddev_numpy(x)
# 19.4 ms ± 1.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit stddev_py(x_np)
# 38.3 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit stddev_numpy(x_np)
# 206 μs ± 83.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit stddev_py(xa)
# 12.6 ms ± 156 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit stddev_numpy(xa)
# 274 μs ± 68.8 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

So, using numpy slows down operations if data are in a list, using generic interface is much slower for numpy arrays and stdlib arrays.

And for that reason all the ML frameworks in Python have to reimplement numpy interface, because np.sum won’t be efficient for PyTorch or JAX arrays.

In Julia, a generic stddev function will be efficient, as long as optimized sum and dot methods are available for a given datatype. Or, if we want to, we can define an optimized stddev for a specific type, like range.

1 Like