Hello, in my DataToolkitBase.jl I make extensive use of advised functions, e.g. instead of
identity(x)
I’ll have
@advise identity(x)
As such, I’m very interested in minimising the performance impact of advised calls. Currently, an advised call seems to have an overhead of a few microseconds.
I’ve had a good look at this myself, but I can’t see anything I can do to improve this, and so I’m hoping other people might be able to provide some pointers .
Each “advice” is represented with the following structure,
struct DataAdvice{func, context} <: Function
priority::Int # REVIEW should this really be an Int?
f::Function
end
where the “advise function” (f
), takes a tuple of:
- a post-processing function
- the function being called
- the function arguments
- the function keyword arguments
and returns a tuple of the same form.
This allows an “all advises” function to be created by composing all of the individual DataAdvice
s, and I do exactly this: ∘(reverse(advisors)...)
.
Each advice is called with this method:
The method definition
function (dt::DataAdvice{F, C})(
(post, func, args, kwargs)::Tuple{Function, Function, Tuple, NamedTuple}) where {F, C}
# Abstract-y `typeof`.
atypeof(val::Any) = typeof(val)
atypeof(val::Type) = Type{val}
# @info "Testing $dt"
if hasmethod(dt.f, Tuple{typeof(post), typeof(func), atypeof.(args)...}, keys(kwargs))
# @info "Applying $dt"
result = invokepkglatest(dt.f, post, func, args...; kwargs...)
if result isa Tuple{Function, Function, Tuple}
post, func, args = result
(post, func, args, NamedTuple())
else
result
end
else
(post, func, args, kwargs) # act as the identity fuction
end
end
In the no-op case, this has an overhead of ~0.5us and performs 9 allocations (of ~430 bytes in total). I can reduce this to ~0.2us (and 8 allocations) by resolving the hasmethod
test in a generated function, but then the first call takes as much as 1000us.
Generated function form
@generated function (dt::DataAdvice{F, C})(funcargs::Tuple{Function, Function, Tuple, NamedTuple}) where {F, C}
Tpost, Tfunc, Targs, Tkwargs = funcargs.parameters
kwargkeys = first(Tkwargs.parameters)
if hasmethod(F, Tuple{Tpost, Tfunc, Targs.parameters...}, kwargkeys)
quote
post, func, args, kwargs = funcargs
result = invokepkglatest(dt.f, post, func, aargs...; kwargs...)
if result isa Tuple{Function, Function, Tuple}
post, func, args = result
(post, func, args, NamedTuple())
else
result
end
end
else
:funcargs
end
end
While this is somewhat underwhelming, with the generated method the total overhead for calling a ~dozen no-op composed advise functions drops from ~4us to ~300ns. The first-call cost is now as much as 10,000us though. ~10ms of first-time cost per function for ~50 different advised calls adds up to half a second of latency, which isn’t ideal.
Separately to the cost of applying the advise functions to the tuple, there’s the cost in creating that tuple and then going from the final tuple to the end result.
This is done by this method that converts a (func, args...; kwargs...)
call to said tuple, and then obtains the final result:
function (dta::DataAdviceAmalgamation)(func::Function, args...; kwargs...)
post::Function, func2::Function, args2::Tuple, kwargs2::NamedTuple =
dta((identity, func, args, merge(NamedTuple(), kwargs)))
invokepkglatest(func2, args2...; kwargs2...) |> post
end
This seems to incur an overhead of ~1.5us.
For reference, the full “advice” implementation can be found here: src/model/advice.jl.
If there’s any way I could shave this down, and you could give me some pointers on that, it would be much appreciated .