Help optimise an advice implementation

Hello, in my DataToolkitBase.jl I make extensive use of advised functions, e.g. instead of


I’ll have

@advise identity(x)

As such, I’m very interested in minimising the performance impact of advised calls. Currently, an advised call seems to have an overhead of a few microseconds.

I’ve had a good look at this myself, but I can’t see anything I can do to improve this, and so I’m hoping other people might be able to provide some pointers :pray:.

Each “advice” is represented with the following structure,

struct DataAdvice{func, context} <: Function
    priority::Int # REVIEW should this really be an Int?

where the “advise function” (f), takes a tuple of:

  • a post-processing function
  • the function being called
  • the function arguments
  • the function keyword arguments

and returns a tuple of the same form.

This allows an “all advises” function to be created by composing all of the individual DataAdvices, and I do exactly this: ∘(reverse(advisors)...).

Each advice is called with this method:

The method definition
function (dt::DataAdvice{F, C})(
    (post, func, args, kwargs)::Tuple{Function, Function, Tuple, NamedTuple}) where {F, C}
    # Abstract-y `typeof`.
    atypeof(val::Any) = typeof(val)
    atypeof(val::Type) = Type{val}
    # @info "Testing $dt"
    if hasmethod(dt.f, Tuple{typeof(post), typeof(func), atypeof.(args)...}, keys(kwargs))
        # @info "Applying $dt"
        result = invokepkglatest(dt.f, post, func, args...; kwargs...)
        if result isa Tuple{Function, Function, Tuple}
            post, func, args = result
            (post, func, args, NamedTuple())
        (post, func, args, kwargs) # act as the identity fuction

In the no-op case, this has an overhead of ~0.5us and performs 9 allocations (of ~430 bytes in total). I can reduce this to ~0.2us (and 8 allocations) by resolving the hasmethod test in a generated function, but then the first call takes as much as 1000us.

Generated function form
@generated function (dt::DataAdvice{F, C})(funcargs::Tuple{Function, Function, Tuple, NamedTuple}) where {F, C}
    Tpost, Tfunc, Targs, Tkwargs = funcargs.parameters
    kwargkeys = first(Tkwargs.parameters)
    if hasmethod(F, Tuple{Tpost, Tfunc, Targs.parameters...}, kwargkeys)
            post, func, args, kwargs = funcargs
            result = invokepkglatest(dt.f, post, func, aargs...; kwargs...)
            if result isa Tuple{Function, Function, Tuple}
                post, func, args = result
                (post, func, args, NamedTuple())

While this is somewhat underwhelming, with the generated method the total overhead for calling a ~dozen no-op composed advise functions drops from ~4us to ~300ns. The first-call cost is now as much as 10,000us though. ~10ms of first-time cost per function for ~50 different advised calls adds up to half a second of latency, which isn’t ideal.

Separately to the cost of applying the advise functions to the tuple, there’s the cost in creating that tuple and then going from the final tuple to the end result.

This is done by this method that converts a (func, args...; kwargs...) call to said tuple, and then obtains the final result:

function (dta::DataAdviceAmalgamation)(func::Function, args...; kwargs...)
    post::Function, func2::Function, args2::Tuple, kwargs2::NamedTuple =
        dta((identity, func, args, merge(NamedTuple(), kwargs)))
    invokepkglatest(func2, args2...; kwargs2...) |> post

This seems to incur an overhead of ~1.5us.

For reference, the full “advice” implementation can be found here: src/model/advice.jl.

If there’s any way I could shave this down, and you could give me some pointers on that, it would be much appreciated :grinning:.