function A(T::DataType, m::Integer, n::Integer, s::Integer, θ::Number)
θ = T(θ)
half = θ / 2
cos_half = cos(half)
sin_half = sin(half)
kmin = max(0, m - n)
kmax = min(s + m, s - n)
sig = (-1)^(kmin & 1)
ch = cos_half^(2s - 2kmin + m - n)
sh = sin_half^(2kmin - m + n)
fac = factorial(kmin) * factorial(s + m - kmin) * factorial(s - n - kmin) *
factorial(n - m + kmin)
d = sig * ch * sh / fac
for k in (kmin + 1):kmax
sig = -sig
ch *= cos_half^(-2)
sh *= sin_half^2
fac *= k * (n - m + k)
fac /= (s + m - k + 1) * (s - n - k + 1)
d += sig * ch * sh / fac
end
return d * √(factorial(s + m) * factorial(s - m) * factorial(s + n) * factorial(s - n))
end
function B(::Type{T}, m::Integer, n::Integer, s::Integer, θ::Number) where {T}
θ = T(θ)
half = θ / 2
cos_half = cos(half)
sin_half = sin(half)
kmin = max(0, m - n)
kmax = min(s + m, s - n)
sig = (-1)^(kmin & 1)
ch = cos_half^(2s - 2kmin + m - n)
sh = sin_half^(2kmin - m + n)
fac = factorial(kmin) * factorial(s + m - kmin) * factorial(s - n - kmin) *
factorial(n - m + kmin)
d = sig * ch * sh / fac
for k in (kmin + 1):kmax
sig = -sig
ch *= cos_half^(-2)
sh *= sin_half^2
fac *= k * (n - m + k)
fac /= (s + m - k + 1) * (s - n - k + 1)
d += sig * ch * sh / fac
end
return d * √(factorial(s + m) * factorial(s - m) * factorial(s + n) * factorial(s - n))
end
a = @allocated A(Float64, -2, -2, 5, 0.5)
@info "A allocated $a"
b = @allocated B(Float64, -2, -2, 5, 0.5)
@info "B allocated $b"
The only difference between function A() and B() is the type annotation of the first argument.
However, the two functions have very different allocation behavior:
[ Info: A allocated 560
[ Info: B allocated 0
Further investigation showed that all intermediate variables of type T were allocated to the heap in A(), while other types (e.g., Int) were not affected.
I have tried to make a minimal reproduction, but this behavior did not occur with a simplified function body, so I just kept them as they were.
As a heuristic, Julia avoids automatically specializing on argument type parameters in three specific cases: Type , Function , and Vararg . Julia will always specialize when the argument is used within the method, but not if the argument is just passed through to another function. This usually has no performance impact at runtime and improves compiler performance. If you find it does have a performance impact at runtime in your case, you can trigger specialization by adding a type parameter to the method declaration.
Yes, because it didn’t specialize on the argument being a Float64. If julia were to specialize every call for every type that’s passed in, even if it’s passed through, that’d be worse for compile time performance, as explained in the quoted section.
In function A at compile time the information which is provided is this:
T is any type of data type available at runtime, but at compile time we do not know which type it is. (Just in the same way as the compiler doesn’t know which values m or n have.) That is why Julia cannot compile different versions for different inputs, but rather it has to compile in a way which works for all possible values of T.
In function B the type information is available at compile time!
T is avaiable at compile time, so Julia will compile a specialised version for each different type you call it with.
You might know it already, but in general it is counter productive to add too many type annotations, especially for cases like this.
See for example: Performance Tips · The Julia Language
In your case you might write
function A(m, n, s, θ)
half = θ / 2
# ...
end
and the runtime will still be optimal! Note also that types like Number and Integer are non-concrete types, e.g. they help with multiple dispatch but not for the performance.
I think part of the confusion in the latter parts of the thread is how self-referential it is to think about types of types. The type instability (and extra allocations) of T::DataType can make us forget that DataType is a concrete type. In almost every other case, a concrete type annotation would contribute type stability, but it just so happens the instances of DataType are also numerous types. So we have an abstract ::Type{T} to get even more specific than DataType to achieve type stability again; it’s the only example that comes to mind of abstract types that subtype a concrete type e.g. Type{Int} <: DataType, feel free to point out more.
The documentation could use a bit more clarification on this, it is not obvious that annotating T::DataType specializes on the argument type DataType, while annotating T::Type or T::Any will specialize on the argument instanceT if T is called:
julia> using MethodAnalysis
julia> struct M x::Int end
julia> struct N x::Int end
julia> f(T, i) = T(i)
f (generic function with 1 method)
julia> f2(T::Type, i) = T(i)
f2 (generic function with 1 method)
julia> g(T::DataType, i) = T(i)
g (generic function with 1 method)
julia> f(M, 1), f2(M, 2), g(M, 3), f(N, 1), f2(N, 2), g(N, 3)
(M(1), M(2), M(3), N(1), N(2), N(3))
julia> methodinstances(f)
2-element Vector{Core.MethodInstance}:
MethodInstance for f(::Type{M}, ::Int64)
MethodInstance for f(::Type{N}, ::Int64)
julia> methodinstances(f2)
2-element Vector{Core.MethodInstance}:
MethodInstance for f2(::Type{M}, ::Int64)
MethodInstance for f2(::Type{N}, ::Int64)
julia> methodinstances(g)
1-element Vector{Core.MethodInstance}:
MethodInstance for g(::DataType, ::Int64)
Oh, and if anyone else is checking this, make sure to use methodinstances before you use @code_warntype or @btime, those macros will make “specialized method instances” that aren’t actually used by the actual method call for some reason.
This means there might be two separate issues. One is that ::DataType is not specialized, and the other is that using types as arguments causes unnecessary allocations.
The native code is short and looks identical. I can’t read assembly but whenever there’s type instability or extra allocations I expect way more lines. Last week I also ran into another example where 2 versions of a method had the same @code_native but different @btime. I’m on a Mac so anyone with a different machine should corroborate.
One more thing, whether or not interpolate the type, e.g., foo($Float64, $10) and foo(Float64, $10) yield different benchmarks. The former allocates while the latter does not.
Sounds right. Either @code_native xor @btime is inaccurate, and roughly @timeing a non-hoisting loop with no runtime dispatch seems to corroborate @code_native here: 0 allocations even over 1e8 iterations.
julia> foo(Float64,10), foo(10) # compile first
(22026.465794806718, 22026.465794806718)
julia> @time for i in 1:100_000_000
foo(Float64, i) # non-constant local i prevents hoist
end
0.852338 seconds
julia> @time for i in 1:100_000_000
foo(i) # non-constant local i prevents hoist
end
0.848491 seconds
Weirdly, in the other thread I linked earlier, this approach corroborated the @btime difference instead of the matching @code_native/@code_llvm. So maybe there should be issues in both base Julia and BenchmarkTools to figure out what is going on.