This is really helpful, thanks @ChrisRackauckas and @kristoffer.carlsson.
Regarding the indexing issue: what is the “proper” way to call these functions then?
This is really helpful, thanks @ChrisRackauckas and @kristoffer.carlsson.
Regarding the indexing issue: what is the “proper” way to call these functions then?
I tried calling the functions recursively:
function funs(fₙ)
fₙ[1]()
while length(fₙ) > 1
funs(fₙ[2:end])
end
end
# run funs(L.fₙ)
but that was slower than calling them in a for-loop, indexing with a variable.
A.fₙ[3](x)
. But never use a runtime variable to do the indexing if you want that to directly infer and specialize. So if your use-case can’t handle that, I would suggest using FunctionWrappers.jl.
You should really think about whether you need to do this before doing it though. One use case for this was to create the ArrayPartition
in RecursiveArrayTools.jl for DiffEq. This allows you to store N
different arrays as a single array. But, given the same caveats above (literals infer, and runtime variables do not), this has be careful. Notice that
indexing does not infer, while
broadcast overloads use literals from generated functions in order to infer correctly. While not everything in there is completely fixed (I think just the 3 argument similar is un-inferrable for some reason, can’t tell if it’s a Julia issue though), that style of usage gives a type-inferrable usage of A
which represents a heterogeneous array as long as you don’t use the linear indexing, and don’t index A.x[i]
with a runtime variable. DiffEq specifically avoids these cases (and users typically will use a literal in their input function) and so this is a type-stable way to make a heterogeneous set of arrays for this specific use case.
Another example is the “refined problems”
These take in a tuple of functions. However, the solver documentation says explicitly what each function needs to be:
http://docs.juliadiffeq.org/latest/solvers/refined_ode_solve.html#Linear-Nonlinear-(LNL)-ODE-1
and because of that, the solver methods can hard-code the integers for which function is used when, and have this be fully-inferrable:
This is a use case where it would have otherwise required defining a bunch of extra types, each with different numbers of functions (which might be the case in the future, but definitely not right now).
However, if your use case had to loop through the functions, or you had to pick some randomly, or the user had to pick the function, this is not a good idea in a few cases. However, even these “rules” can be broken. For example, if you use a function barrier:
f = A.fₙ[i]
use_f_a_few_times(f)
then there is one dynamic dispatch which takes place, but it will specialize inside of use_f_a_few_times
. @kristoffer.carlsson noted before that dynamic dispatch is something like ~90ns, so if your function is anything non-trivial this price is nothing and you may be worrying too much while a function barrier works just fine.
Also, when a function doesn’t specialize, it’s not all that bad. It’s not able to inline in that case (which is only necessary for quick functions), and sometimes the output types cannot be inferred well. But if you assert/convert on output, you’ll pay a small cost but everything will be fine. Again, dynamic dispatch is only bad if the functions are supposed to be quick, and so
f = A.fₙ[i]
B[j] = f(x)
since putting it into a strictly typed array will convert, this is only bad if the 90ns or so dynamic dispatch on setindex!
is comparable to the time it takes for the function to compute. If it’s not even comparable, then you’re over-optimizing since the conversion that keeps B
strictly typed will make sure no instabilities “leak” past the function call.
So the only cases that are actually bad here are:
That is repeatedly indexing with a variable with small function calls. B[j] = A.f[i](x)
won’t leak type instabilities, but runtime will be increased due to dynamic dispatch.
Not being careful with instability leakage. B = A.f[i](x)
never converts, so the type of B
may not be inferred, and so C = B+A
… “leaks” the non-inferrability.
If you’re avoiding those two cases, you’re fine (notice that two is avoidable just by properly converting or using a function barrier). If (1) is your problem, use FunctionWrappers.jl.
But even if all of this works out, it might still not be a good idea. You can go through all of this trouble to make sure it works and specializes properly, but if your function calls are non-trivial, then again the fact that this all beautifully inlines doesn’t matter. You must also be aware that, because this is typed so strictly, g(A)
will recompile for every new tuple of functions. Is this this behavior you want? Or will this get in the way more than the possibly overspecialization? (Note that JuliaDiffEq may be going the FunctionWrappers route because of this issue)
So there are a lot of details here, but essentially you just think “could the compiler possibly know what exact function I want to use here?” and “does the compiler need to know what exact function I use here?” and you’re pretty much set. This design needs to crafted to specifically handle your problem.
My use case seems indeed to be hitting (1), but it’s not a huge deal. While the function being called is really short, the rest of the computation I do outside of it takes longer.
Benchmarking the full function vs benchmarking the individual parts (and therefore avoiding the dynamic dispatch cost) seems to have a difference of about 30ns, out of 400ns.
I might look into FunctionWrappers, but documentation seems inexistant.
I could also do what you did, and have a few methods where I hardcode the calls for small numbers of functions, since I know how many there are.
Thanks again for the detailed explanation!
@inline function funs(res::Number,fₙ::Function)
res,fₙ(1)
end
@inline function funs(fₙ::Function)
fₙ(1)
end
@inline function funs{N}(res::Number,fₙ::Function,fs::Vararg{Function,N})
res,fₙ(1),funs(fs...)
end
@inline function funs{N}(fₙ::Function,fs::Vararg{Function,N})
fₙ(1),funs(fs...)
end
funs(fs) = funs(fs...)
fs = (x->x,x->x^2,x->x^3)
This should compile well if you plan on using less than 17 functions in fs
.
julia> funs(fs)
(1, (1, 1))
julia> @code_warntype funs(fs)
Variables:
#self#::#funs
fs::Tuple{##1#4,##2#5,##3#6}
Body:
begin
return (Core.tuple)(1, (Core.tuple)((Base.mul_int)(1, 1)::Int64, (Base.mul_int)((Base.mul_int)(1, 1)::Int64, 1)::Int64)::Tuple{Int64,Int64})::Tuple{Int64,Tuple{Int64,Int64}}
end::Tuple{Int64,Tuple{Int64,Int64}}
I’m not sure forcing the inlining is actually necessary either. But if it all inlines, it’ll essentially “build the loop without actually looping” (see what the @code_warntype
builds: a single function which inlined all of our anonymous functions into one tuple generation call!). It will need to compile a full new setup for every tuple of functions though.
(There’s probably a way to avoid splatting, which case the magic 17 remark can go away)