# Extra allocation with `T::DataType`?

``````function A(T::DataType, m::Integer, n::Integer, s::Integer, θ::Number)
θ = T(θ)
half = θ / 2
cos_half = cos(half)
sin_half = sin(half)
kmin = max(0, m - n)
kmax = min(s + m, s - n)
sig = (-1)^(kmin & 1)
ch = cos_half^(2s - 2kmin + m - n)
sh = sin_half^(2kmin - m + n)
fac = factorial(kmin) * factorial(s + m - kmin) * factorial(s - n - kmin) *
factorial(n - m + kmin)
d = sig * ch * sh / fac
for k in (kmin + 1):kmax
sig = -sig
ch *= cos_half^(-2)
sh *= sin_half^2
fac *= k * (n - m + k)
fac /= (s + m - k + 1) * (s - n - k + 1)
d += sig * ch * sh / fac
end

return d * √(factorial(s + m) * factorial(s - m) * factorial(s + n) * factorial(s - n))
end

function B(::Type{T}, m::Integer, n::Integer, s::Integer, θ::Number) where {T}
θ = T(θ)
half = θ / 2
cos_half = cos(half)
sin_half = sin(half)
kmin = max(0, m - n)
kmax = min(s + m, s - n)
sig = (-1)^(kmin & 1)
ch = cos_half^(2s - 2kmin + m - n)
sh = sin_half^(2kmin - m + n)
fac = factorial(kmin) * factorial(s + m - kmin) * factorial(s - n - kmin) *
factorial(n - m + kmin)
d = sig * ch * sh / fac
for k in (kmin + 1):kmax
sig = -sig
ch *= cos_half^(-2)
sh *= sin_half^2
fac *= k * (n - m + k)
fac /= (s + m - k + 1) * (s - n - k + 1)
d += sig * ch * sh / fac
end

return d * √(factorial(s + m) * factorial(s - m) * factorial(s + n) * factorial(s - n))
end

a = @allocated A(Float64, -2, -2, 5, 0.5)
@info "A allocated \$a"

b = @allocated B(Float64, -2, -2, 5, 0.5)
@info "B allocated \$b"
``````

The only difference between function `A()` and `B()` is the type annotation of the first argument.

However, the two functions have very different allocation behavior:

``````[ Info: A allocated 560
[ Info: B allocated 0
``````

Further investigation showed that all intermediate variables of type `T` were allocated to the heap in `A()`, while other types (e.g., `Int`) were not affected.

I have tried to make a minimal reproduction, but this behavior did not occur with a simplified function body, so I just kept them as they were.

As a heuristic, Julia avoids automatically specializing on argument type parameters in three specific cases: `Type` , `Function` , and `Vararg` . Julia will always specialize when the argument is used within the method, but not if the argument is just passed through to another function. This usually has no performance impact at runtime and improves compiler performance. If you find it does have a performance impact at runtime in your case, you can trigger specialization by adding a type parameter to the method declaration.

https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing

By using `::Type{T}`, you’re forcing julia to specialize the code.

1 Like
1 Like

It does have a great impact on the performance. Benchmarks show that the unspecialized version was 30x slower.

Yes, because it didn’t specialize on the argument being a `Float64`. If julia were to specialize every call for every type that’s passed in, even if it’s passed through, that’d be worse for compile time performance, as explained in the quoted section.

But note that there is a `θ = T(θ)` call in the first line, so `T` should have been used?

I wouldn’t have expected this to be an issue in your example. I would have expected the line

to force specialization because (according to the linked documentation)

Julia will always specialize when the argument is used within the method, but not if the argument is just passed through to another function.

Can this be a bug in the specialization system?

New github issue. It looks like we need to either change the behavior or clarify the documentation.

1 Like

For me it doesn’t look like a bug at all.

• In function `A` at compile time the information which is provided is this:

• `T` is any type of data type available at runtime, but at compile time we do not know which type it is. (Just in the same way as the compiler doesn’t know which values `m` or `n` have.) That is why Julia cannot compile different versions for different inputs, but rather it has to compile in a way which works for all possible values of `T`.
• In function `B` the type information is available at compile time!

• `T` is avaiable at compile time, so Julia will compile a specialised version for each different type you call it with.

You might know it already, but in general it is counter productive to add too many type annotations, especially for cases like this.
See for example: Performance Tips · The Julia Language

In your case you might write

``````function A(m, n, s, θ)
half = θ / 2
# ...
end
``````

and the runtime will still be optimal! Note also that types like `Number` and `Integer` are non-concrete types, e.g. they help with multiple dispatch but not for the performance.

3 Likes

I think part of the confusion in the latter parts of the thread is how self-referential it is to think about types of types. The type instability (and extra allocations) of `T::DataType` can make us forget that `DataType` is a concrete type. In almost every other case, a concrete type annotation would contribute type stability, but it just so happens the instances of `DataType` are also numerous types. So we have an abstract `::Type{T}` to get even more specific than `DataType` to achieve type stability again; it’s the only example that comes to mind of abstract types that subtype a concrete type e.g. `Type{Int} <: DataType`, feel free to point out more.

The documentation could use a bit more clarification on this, it is not obvious that annotating `T::DataType` specializes on the argument type `DataType`, while annotating `T::Type` or `T::Any` will specialize on the argument instance `T` if `T` is called:

``````julia> using MethodAnalysis

julia> struct M x::Int end

julia> struct N x::Int end

julia> f(T, i) = T(i)
f (generic function with 1 method)

julia> f2(T::Type, i) = T(i)
f2 (generic function with 1 method)

julia> g(T::DataType, i) = T(i)
g (generic function with 1 method)

julia> f(M, 1), f2(M, 2), g(M, 3), f(N, 1), f2(N, 2), g(N, 3)
(M(1), M(2), M(3), N(1), N(2), N(3))

julia> methodinstances(f)
2-element Vector{Core.MethodInstance}:
MethodInstance for f(::Type{M}, ::Int64)
MethodInstance for f(::Type{N}, ::Int64)

julia> methodinstances(f2)
2-element Vector{Core.MethodInstance}:
MethodInstance for f2(::Type{M}, ::Int64)
MethodInstance for f2(::Type{N}, ::Int64)

julia> methodinstances(g)
1-element Vector{Core.MethodInstance}:
MethodInstance for g(::DataType, ::Int64)
``````
3 Likes

But in the Github issue by @mikmoore `T::Any` is not specialized either.

According to MethodAnalysis, it does specialize. I see that unnecessary allocation though, what IS that for?

``````julia> using BenchmarkTools, MethodAnalysis

julia> foo(T,x) = exp(T(x))
foo (generic function with 1 method)

julia> foo(Float64,10), foo(Int, 10)
(22026.465794806718, 22026.465794806718)

julia> methodinstances(foo)
2-element Vector{Core.MethodInstance}:
MethodInstance for foo(::Type{Float64}, ::Int64)
MethodInstance for foo(::Type{Int64}, ::Int64)

julia> @btime foo(\$Float64,\$10)
127.871 ns (1 allocation: 16 bytes)
22026.465794806718

julia> @btime foo(\$Int,\$10)
138.402 ns (1 allocation: 16 bytes)
22026.465794806718
``````

Oh, and if anyone else is checking this, make sure to use `methodinstances` before you use `@code_warntype` or `@btime`, those macros will make “specialized method instances” that aren’t actually used by the actual method call for some reason.

This means there might be two separate issues. One is that `::DataType` is not specialized, and the other is that using types as arguments causes unnecessary allocations.

The native code is short and looks identical. I can’t read assembly but whenever there’s type instability or extra allocations I expect way more lines. Last week I also ran into another example where 2 versions of a method had the same `@code_native` but different `@btime`. I’m on a Mac so anyone with a different machine should corroborate.

``````julia> foo(T,x) = exp(T(x))
foo (generic function with 1 method)

julia> foo(x) = exp(Float64(x))
foo (generic function with 2 methods)

julia> @code_native foo(Float64,10)
.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[1]:1 within `foo`
subq	\$8, %rsp
; │┌ @ float.jl:146 within `Float64`
vcvtsi2sd	%rdi, %xmm0, %xmm0
; │└
movabsq	\$exp, %rax
callq	*%rax
popq	%rax
retq
nopw	(%rax,%rax)
; └

julia> @code_native foo(10)
.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[2]:1 within `foo`
subq	\$8, %rsp
; │┌ @ float.jl:146 within `Float64`
vcvtsi2sd	%rdi, %xmm0, %xmm0
; │└
movabsq	\$exp, %rax
callq	*%rax
popq	%rax
retq
nopw	(%rax,%rax)
; └
``````

One more thing, whether or not interpolate the type, e.g., `foo(\$Float64, \$10)` and `foo(Float64, \$10)` yield different benchmarks. The former allocates while the latter does not.

``````julia> @btime foo(Float64, \$10)
10.060 ns (0 allocations: 0 bytes)
22026.465794806718

julia> @btime foo(\$Float64, \$10)
89.201 ns (1 allocation: 16 bytes)
22026.465794806718
``````
1 Like

Maybe that’s it? The interpolation demotes `Float64` from `Type{Float64}` to `DataType` in the benchmark somehow and that causes the allocation?

Sounds right. Either `@code_native` xor `@btime` is inaccurate, and roughly `@time`ing a non-hoisting loop with no runtime dispatch seems to corroborate `@code_native` here: 0 allocations even over 1e8 iterations.

``````julia> foo(Float64,10), foo(10) # compile first
(22026.465794806718, 22026.465794806718)

julia> @time for i in 1:100_000_000
foo(Float64, i) # non-constant local i prevents hoist
end
0.852338 seconds

julia> @time for i in 1:100_000_000
foo(i) # non-constant local i prevents hoist
end
0.848491 seconds
``````

Weirdly, in the other thread I linked earlier, this approach corroborated the `@btime` difference instead of the matching `@code_native`/`@code_llvm`. So maybe there should be issues in both base Julia and BenchmarkTools to figure out what is going on.