Slow.jl: keep around a slow implementation of your functions

xzackli · April 2, 2021, 9:53pm

There exists a tension between readability and speed. This is discussed briefly in Keeping track of simple vs high-performance versions - Usage / Performance - JuliaLang.

Indeed, I often write two versions of a function,

V1: Naive implementation. Since Julia is so expressive, this implementation is usually short and resembles the published equations or pseudocode.
V2: Optimized implementation. This version is written for a computer, i.e. \subset \{ exploits symmetries, reuses allocated memory, hits the cache in a friendly way, reorders calculations for SIMD, divides the work with threads, precomputes parts, caches intermediate expressions, \dots \}.

V1 is easier to understand and extend. V2 is the implementation exported in your package and it’s often much faster, but complicated and verbose. Julia sometimes allows you to use abstractions such that V1 \approx V2, but this is not always possible.

I’ve thrown together Slow.jl, which lets you keep both versions by exporting macros @slowdef and @slow. We define the function with @slowdef, which injects a first argument of the type Slow.SlowImplementation for dispatch. This slow implementation is callable with @slow(f(args...)), which is sugar for f(Slow.SlowImplementation(), args..). It’s basically a shortcut for a multimethod with two options. If you have three different implementations, you probably should just define your own algorithm type for the usual multimethod pattern.

Example

using Slow

# V1: replaces this signature with f(::Slow.SlowImplementation, x, y)
@slowdef function f(x, y)
    sleep(1)  
    return sin(x)
end

# V2: our exported fast implementation
function f(x, y)
    # blah blah complicated optimizations
    return sin(x)
end

@time @slow f(1.0, 1.0)  # calls f(Slow.SlowImplementation(), 1.0, 1.0)
@time f(1.0, 1.0)

1.001939 seconds (1.80 k allocations: 101.328 KiB)
0.000000 seconds

I’d love to hear about alternative approaches!

stevengj · April 2, 2021, 10:49pm

What’s the advantage of @slow f(...) vs. f_slow(...), i.e. just renaming your function?

The more complicated case, in which manual renaming doesn’t work, is when you want to call a function g(x), which calls some other functions, which eventually call f(x). In an ideal world, calling @slow g(x) would cause the nested call to f(x) to be replaced with the slow version, but it doesn’t sound like your macro can do this? (i.e. you’d have to change all of your function signatures to pass an implementation-selector object down the call chain).

xzackli · April 3, 2021, 12:00am

I’m trying to express the idea that f and f_slow really are the same map, but with different implementations. I agree it’s not so much better than simply writing some docstring that promises this for f and f_slow.

I find myself wanting a nested implementation selector pretty frequently, but it seems challenging to implement, as you say. Threading is the other situation where I find myself wanting this, where efficient single- and multi-threaded implementations are often very different, but they also represent the same map.

Now that I think about it, maybe the right way to do this is to dispatch on a wrapped primitive type, like Unitful.

slow/fast:

f(Slow64(1.0))  # calls slow implementation
f(1.0)  # calls fast implementation

threading:

f(SingleThread64(1.0))  # calls single-thread-optimized 
f(Thread64(0.0))  # calls a clever threaded implementation
f(0.0)  # up to package discretion

This propagates to all the way down, which is good for threading. Maybe you’d want to keep track of symbols like ForwardDiff.Dual for slow/fast?

xzackli · April 3, 2021, 1:19am

I want to add that this macro is similar in spirit to LinearAlgebra.lu, which is where I first encountered this pattern.

lu(A, pivot=Val(true); check = true) -> F::LU

The value type for pivot does compile-time implementation selection using the same lu name since they perform the same factorization, rather than writing lu_pivot.

marius311 · April 3, 2021, 2:47am

I like the original motivation as I run into this alot too, and agree that @stevengj’s suggestion would make this much more powerful. I think it’d actually be a quite succint implementation,

# define a fallback which does nothing
slow_call(f, args...; kwargs...) = f(args...; kwargs...)

# then the macro 
@slowdef f(x) = ... 
# expands to:
slow_call(::typeof(f), x) = ...

and then just write a Cassette.jl pass which rewrites every function call to a slow_call, such that everything that doesn’t have a “slow” version defined hits the fallback. The slow version will surely suffer from usual compiler time / inference struggles with Cassette, but hey, its the “slow” version

ffevotte · April 3, 2021, 8:57am

I have a piece of code that I sometimes use for this purpose, where expanded macro calls look like this:

# define a "slow" variant
function fun(::Val{:slow}, x, y)
    sleep(1)
    x + y
end
# ... and make it the default
variant(::typeof(fun)) = Val(:slow)
fun(x, y) = fun(variant(fun), x, y)


# define a "fast" variant
function fun(::Val{:fast}, x, y)
    x + y
end
# ... and make it the new default
variant(::typeof(fun)) = Val(:fast)
fun(x, y) = fun(variant(fun), x, y)

Then redefining the specialized variant method allows to choose the implementation to be used everywhere, even in nested calls:

# an algorithm that uses whichever variant is currently the default one
julia> algorithm(x) = fun(x, 1)
algorithm (generic function with 1 method)

# This one uses the "fast" variant (defined last, and therefore used by default)
julia> @time algorithm(2)
  0.000000 seconds
3

# Now make the "slow" variant the new default
julia> variant(::typeof(fun)) = Val(:slow)
variant (generic function with 1 method)

julia> @time algorithm(2)
  1.001430 seconds (4 allocations: 128 bytes)
3

Below is my implementation of it (sufficient for my needs, but I haven’t yet taken the time to package it):

(show details)

module Variants

using MacroTools

function variant end

function _usevariant(variant_name, fun_name)
    variant_name = QuoteNode(variant_name)
    quote
        $(@__MODULE__).variant(::typeof($fun_name)) = Val($variant_name)
    end
end

macro variant(variant_name, defun)

    defvariant = splitdef(defun)
    pushfirst!(defvariant[:args], :(selector::Val{$(QuoteNode(variant_name))}))

    def = splitdef(defun)
    fun_name = def[:name]
    argname(x::Symbol) = x
    argname(e::Expr)   = e.args[1]
    def[:body] = quote
        $fun_name($variant($fun_name), $(argname.(def[:args])...))
    end

    expr = quote
        $(MacroTools.combinedef(defvariant))
        $(MacroTools.combinedef(def))
    end
    push!(expr.args, _usevariant(variant_name, fun_name))
    esc(expr)
end

macro usevariant(variant_name, fun_name)
    _usevariant(variant_name, fun_name) |> esc
end

end

julia> using .Variants

julia> Variants.@variant slow function fun(x::Number, y)
           sleep(1)
           x + y
       end

julia> Variants.@variant fast fun(x, y) = x + y

julia> algorithm(x) = fun(x, 1)
algorithm (generic function with 1 method)

julia> @time algorithm(2)
  0.000000 seconds
3

julia> Variants.@usevariant slow fun

julia> @time algorithm(2)
  1.002183 seconds (4 allocations: 128 bytes)
3

xzackli · April 3, 2021, 6:06pm

I like your Variant approach, and it’s able to deal with nesting very well. I think if this had been a package, I probably would have been using it a fair amount. This uses the module’s method table as a mutable state, which feels a little weird to me – but probably is ok for most situations.

@marius311 I will try implementing this with Casette at some point, it looks like just the right tool for the task. I had never looked into it in detail, but now I’m now skimming the docs and it seems to be the missing piece of the language I’ve been looking for.

Henrique_Becker · April 3, 2021, 6:28pm

I think this approach is similar to the one suggested for toggle-able assertions.

Topic		Replies	Views
Avoid slowness in the first execution of a function New to Julia question , performance	3	559	August 16, 2021
Significant difference in execution speed between similar functions with nested loops Performance question	4	235	March 8, 2023
Keeping track of simple vs high-performance versions Performance	2	1106	January 30, 2018
Avoiding functions in types; performance comparison & invitation for code review Performance	8	1085	June 10, 2018
Comparing performance of 2 simple averaging functions - why is one faster? Performance	5	481	August 31, 2020

Slow.jl: keep around a slow implementation of your functions

Example

Related topics