To re-cap: there are three cases where Julia may choose to not specialize on an argument type: Function, Type, and Vararg. This non-specialization can save the compiler the effort of recompiling functions for different argument types that actually have little/no effect on the generated code. It is possible to override this default by adding otherwise-unnecessary parametric annotations to those arguments in the method signature. It is possible to force non-specialization on other types with the use of @nospecialize or other code patterns. Using an otherwise non-specialized argument explicitly within the function body will usually cause specialization (although I recall seeing cases where it doesnât), but not always and the prevailing advice of âwrite short functions for clarity and rely on inlining to remove the costâ works against this.
The current default seems to lead directly to ~2 issues per month on Discourse, although the iceberg probably goes much deeper in its (un-noticed or ignored) effect on members of the community. My speculation is that, âon average,â user code would see performance benefits (even accounting for regression due to extra compilation) if we removed some or all of the three special cases (and re-add the current behavior in a few key snippets that rely on the current default).
For my use code, personally, Iâve overridden almost every place where Iâve noticed this non-specialization in effect because I only ever use the functions in question with a small number of argument types within a session (so the compiler/memory load is modest) and the non-specialization results in considerable performance losses.
The non-specialization is definitely important, but the places it is important are often places like Base or package code where it might be used diversely within a single application. This code is contributed by (or at least reviewed by) veteran users who can be expected to know the performance tips and who are putting extra effort into this re-usable code. With a change to the default, it would become necessary to annotate some sites as @nospecialize for the same reason the current defaults exist, but I suspect that code outside of Base and a small set of packages would require many fewer annotations than we use now.
Going out on a limb, I suspect that âaverageâ performance across existing Julia code (most of which is private and written by individuals with shallower knowledge than Discourse regulars) would be positively affected by a changed default. And I think that âthis code seems slow to compileâ is a more suitable and less frequent issue to require Discourse help (if a bit more complicated to diagnose and resolve) than the âthis code is slow and allocates a ton even though Iâve done everything right except to know this special exceptionâ issues we see currently.
If itâs worth changing the default in only a subset of these three cases, thatâs also worth considering.
Or at least having much more sensible defaults given the current state of Julia precompilation and package extensions. I could see having a larger upper bound for Vararg non-specialization be an easy improvement. Right now it seems too extreme imo:
julia> f(x...) = x
julia> Base.specializations(@which f(1, 2))
Base.MethodSpecializations(MethodInstance for f(::Int64, ::Vararg{Int64}))
which is not great when, as you mention, itâs often encouraged to write generic code without worrying about lost performance.
Also worth noting that the Vararg specialization rules are currently so complicated they seem to result in unexpected behavior when you combine them with other non-specializing types:
I think simpler rules would generally be better if the specialization behavior is changed.
All these heuristics that can lead to slow code in an unexpected place are definitely my #1 gripe in terms of writing composable performant code. Especially noticeable when using higher-order functions liberally.
Would be nice to solve this issue somehow⌠One partial approach could be to let function authors/users opt into higher specialization/inference limits, Allow more aggressive inference for some functions ¡ Issue #52239 ¡ JuliaLang/julia ¡ GitHub.
As for this specific linked âperformance tipâ, I personally still donât fully understand what docs mean by âargument is usedâ in
Julia will always specialize when the argument is used within the method, but not if the argument is just passed through to another function.
and how lack of specialization affects/doesnât affect performance of functions that âjust pass throughâ this argument.
To share one example, I forced some Vararg specialization in CUDA.jl last week and it resulted in an over 100% performance gain for small CUDA kernel launches:
If even very established and optimized packages like CUDA.jl can instantly see improvements of this magnitude, it would be very interesting to see the performance improvements over the entire ecosystem.
Unfortunately it didnât seem to in practice. The linked writing is about functions specifically, but excessive specialization also applies to types and Vararg. Specialization is a double edged sword: we compile more versions of a method in exchange for optimizing execution. This is worth it if we compile for a fixed number of types and reuse the compiled code in hot loops, but it backfires if we compile for an arbitrary large number of types, especially if the compiled code is not reused like in reflection. Varying over functions, Type{T}, and Vararg naturally involves an arbitrarily large number of types, and turns out their unconditional automatic specialization makes base Julia unusable.
It seems to just mean âcalledâ but Iâd want an expert on the compiler to clarify that.
Thatâs also vaguely worded to the point of possibly being misleading, but thereâs a couple ways:
Performance is usually hurt if the output of said higher-order function needs to be used in the rest of the code because runtime dispatches need to handle the abstractly inferred return type. But if thatâs not the case, you donât see a performance issue, in fact you may benefit from far less compilation over an arbitrarily large number of input functions. We really donât need to compile foo(f, x) = @noinline bar(f, baz(x)) for every f if foo is only called at top-level; weâd only need to compile the bar doing the real work of calling f.
Even in the cases where the return type should be inferred for performance, method inlining can make this moot. map itself doesnât call the input function, it ends up specializing over it because of propagated inlining of the callees that do call it. This doesnât cause compilation bloat because the inlined callees arenât compiled separately.
Taking this into account, I view this as a weird exception that we have to live with because excessive compilation is harder to diagnose and reverse, and manually opting into nonspecialization would involve more work.
I donât think we are in disagreement here that specializing on absolutely everything = bad. The question is really how far to push the specialization now that Julia has improved in other areas.
For functions, it is binary yes/no, but for Vararg it is a sliding scale along multiple axes. I would like to see hard data on where the optimal tradeoff is with current Julia and whether it makes sense to slide the scale furtherâŚ
The CUDA example had plain args.... If all elements had the same type, you would get a specialization for the type, but if not, you specialize on only the first element and infer the rest as Any. Specializing over heterogeneous elements is justified in hot loops like repeated CUDA kernel calls, but special cases donât make for general rules, and the general rule prevents severe compilation bloat.
Also worth mentioning that inference over runtime types can happen without specialization, as the docs for @nospecialize versus @nospecializeinfer show. Uncalled input functions, types, and obviously Vararg seem to work more like the latter based on the performance of subsequent code in callers. If there is ever a âwhat if we specialized on everything by defaultâ benchmark made for updated versions of Julia, itâd be interesting to also test âwhat if we inferred over everything by default so we at least donât store more compiled codeâ.