Extra allocation with `T::DataType`?

function A(T::DataType, m::Integer, n::Integer, s::Integer, θ::Number)
    θ = T(θ)
    half = θ / 2
    cos_half = cos(half)
    sin_half = sin(half)
    kmin = max(0, m - n)
    kmax = min(s + m, s - n)
    sig = (-1)^(kmin & 1)
    ch = cos_half^(2s - 2kmin + m - n)
    sh = sin_half^(2kmin - m + n)
    fac = factorial(kmin) * factorial(s + m - kmin) * factorial(s - n - kmin) *
          factorial(n - m + kmin)
    d = sig * ch * sh / fac
    for k in (kmin + 1):kmax
        sig = -sig
        ch *= cos_half^(-2)
        sh *= sin_half^2
        fac *= k * (n - m + k)
        fac /= (s + m - k + 1) * (s - n - k + 1)
        d += sig * ch * sh / fac
    end

    return d * √(factorial(s + m) * factorial(s - m) * factorial(s + n) * factorial(s - n))
end

function B(::Type{T}, m::Integer, n::Integer, s::Integer, θ::Number) where {T}
    θ = T(θ)
    half = θ / 2
    cos_half = cos(half)
    sin_half = sin(half)
    kmin = max(0, m - n)
    kmax = min(s + m, s - n)
    sig = (-1)^(kmin & 1)
    ch = cos_half^(2s - 2kmin + m - n)
    sh = sin_half^(2kmin - m + n)
    fac = factorial(kmin) * factorial(s + m - kmin) * factorial(s - n - kmin) *
          factorial(n - m + kmin)
    d = sig * ch * sh / fac
    for k in (kmin + 1):kmax
        sig = -sig
        ch *= cos_half^(-2)
        sh *= sin_half^2
        fac *= k * (n - m + k)
        fac /= (s + m - k + 1) * (s - n - k + 1)
        d += sig * ch * sh / fac
    end

    return d * √(factorial(s + m) * factorial(s - m) * factorial(s + n) * factorial(s - n))
end

a = @allocated A(Float64, -2, -2, 5, 0.5)
@info "A allocated $a"

b = @allocated B(Float64, -2, -2, 5, 0.5)
@info "B allocated $b"

The only difference between function A() and B() is the type annotation of the first argument.

However, the two functions have very different allocation behavior:

[ Info: A allocated 560
[ Info: B allocated 0

Further investigation showed that all intermediate variables of type T were allocated to the heap in A(), while other types (e.g., Int) were not affected.

I have tried to make a minimal reproduction, but this behavior did not occur with a simplified function body, so I just kept them as they were.

As a heuristic, Julia avoids automatically specializing on argument type parameters in three specific cases: Type , Function , and Vararg . Julia will always specialize when the argument is used within the method, but not if the argument is just passed through to another function. This usually has no performance impact at runtime and improves compiler performance. If you find it does have a performance impact at runtime in your case, you can trigger specialization by adding a type parameter to the method declaration.

https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing

By using ::Type{T}, you’re forcing julia to specialize the code.

1 Like

Looks like https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing.

1 Like

It does have a great impact on the performance. Benchmarks show that the unspecialized version was 30x slower.

Yes, because it didn’t specialize on the argument being a Float64. If julia were to specialize every call for every type that’s passed in, even if it’s passed through, that’d be worse for compile time performance, as explained in the quoted section.

But note that there is a θ = T(θ) call in the first line, so T should have been used?

I wouldn’t have expected this to be an issue in your example. I would have expected the line

to force specialization because (according to the linked documentation)

Julia will always specialize when the argument is used within the method, but not if the argument is just passed through to another function.

Can this be a bug in the specialization system?

New github issue. It looks like we need to either change the behavior or clarify the documentation.

1 Like

For me it doesn’t look like a bug at all.

  • In function A at compile time the information which is provided is this:

    • T is any type of data type available at runtime, but at compile time we do not know which type it is. (Just in the same way as the compiler doesn’t know which values m or n have.) That is why Julia cannot compile different versions for different inputs, but rather it has to compile in a way which works for all possible values of T.
  • In function B the type information is available at compile time!

    • T is avaiable at compile time, so Julia will compile a specialised version for each different type you call it with.

You might know it already, but in general it is counter productive to add too many type annotations, especially for cases like this.
See for example: Performance Tips · The Julia Language

In your case you might write

function A(m, n, s, θ)
    half = θ / 2
    # ...
end

and the runtime will still be optimal! Note also that types like Number and Integer are non-concrete types, e.g. they help with multiple dispatch but not for the performance.

3 Likes

I think part of the confusion in the latter parts of the thread is how self-referential it is to think about types of types. The type instability (and extra allocations) of T::DataType can make us forget that DataType is a concrete type. In almost every other case, a concrete type annotation would contribute type stability, but it just so happens the instances of DataType are also numerous types. So we have an abstract ::Type{T} to get even more specific than DataType to achieve type stability again; it’s the only example that comes to mind of abstract types that subtype a concrete type e.g. Type{Int} <: DataType, feel free to point out more.

The documentation could use a bit more clarification on this, it is not obvious that annotating T::DataType specializes on the argument type DataType, while annotating T::Type or T::Any will specialize on the argument instance T if T is called:

julia> using MethodAnalysis

julia> struct M x::Int end

julia> struct N x::Int end

julia> f(T, i) = T(i)
f (generic function with 1 method)

julia> f2(T::Type, i) = T(i)
f2 (generic function with 1 method)

julia> g(T::DataType, i) = T(i)
g (generic function with 1 method)

julia> f(M, 1), f2(M, 2), g(M, 3), f(N, 1), f2(N, 2), g(N, 3)
(M(1), M(2), M(3), N(1), N(2), N(3))

julia> methodinstances(f)
2-element Vector{Core.MethodInstance}:
 MethodInstance for f(::Type{M}, ::Int64)
 MethodInstance for f(::Type{N}, ::Int64)

julia> methodinstances(f2)
2-element Vector{Core.MethodInstance}:
 MethodInstance for f2(::Type{M}, ::Int64)
 MethodInstance for f2(::Type{N}, ::Int64)

julia> methodinstances(g)
1-element Vector{Core.MethodInstance}:
 MethodInstance for g(::DataType, ::Int64)
3 Likes

But in the Github issue by @mikmoore T::Any is not specialized either.

According to MethodAnalysis, it does specialize. I see that unnecessary allocation though, what IS that for?

julia> using BenchmarkTools, MethodAnalysis

julia> foo(T,x) = exp(T(x))
foo (generic function with 1 method)

julia> foo(Float64,10), foo(Int, 10)
(22026.465794806718, 22026.465794806718)

julia> methodinstances(foo)
2-element Vector{Core.MethodInstance}:
 MethodInstance for foo(::Type{Float64}, ::Int64)
 MethodInstance for foo(::Type{Int64}, ::Int64)

julia> @btime foo($Float64,$10)
  127.871 ns (1 allocation: 16 bytes)
22026.465794806718

julia> @btime foo($Int,$10)
  138.402 ns (1 allocation: 16 bytes)
22026.465794806718

Oh, and if anyone else is checking this, make sure to use methodinstances before you use @code_warntype or @btime, those macros will make “specialized method instances” that aren’t actually used by the actual method call for some reason.

This means there might be two separate issues. One is that ::DataType is not specialized, and the other is that using types as arguments causes unnecessary allocations.

The native code is short and looks identical. I can’t read assembly but whenever there’s type instability or extra allocations I expect way more lines. Last week I also ran into another example where 2 versions of a method had the same @code_native but different @btime. I’m on a Mac so anyone with a different machine should corroborate.

julia> foo(T,x) = exp(T(x))
foo (generic function with 1 method)

julia> foo(x) = exp(Float64(x))
foo (generic function with 2 methods)

julia> @code_native foo(Float64,10)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[1]:1 within `foo`
	subq	$8, %rsp
; │┌ @ float.jl:146 within `Float64`
	vcvtsi2sd	%rdi, %xmm0, %xmm0
; │└
	movabsq	$exp, %rax
	callq	*%rax
	popq	%rax
	retq
	nopw	(%rax,%rax)
; └

julia> @code_native foo(10)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[2]:1 within `foo`
	subq	$8, %rsp
; │┌ @ float.jl:146 within `Float64`
	vcvtsi2sd	%rdi, %xmm0, %xmm0
; │└
	movabsq	$exp, %rax
	callq	*%rax
	popq	%rax
	retq
	nopw	(%rax,%rax)
; └

One more thing, whether or not interpolate the type, e.g., foo($Float64, $10) and foo(Float64, $10) yield different benchmarks. The former allocates while the latter does not.

julia> @btime foo(Float64, $10)
  10.060 ns (0 allocations: 0 bytes)
22026.465794806718

julia> @btime foo($Float64, $10)
  89.201 ns (1 allocation: 16 bytes)
22026.465794806718
1 Like

Maybe that’s it? The interpolation demotes Float64 from Type{Float64} to DataType in the benchmark somehow and that causes the allocation?

Sounds right. Either @code_native xor @btime is inaccurate, and roughly @timeing a non-hoisting loop with no runtime dispatch seems to corroborate @code_native here: 0 allocations even over 1e8 iterations.

julia> foo(Float64,10), foo(10) # compile first
(22026.465794806718, 22026.465794806718)

julia> @time for i in 1:100_000_000
         foo(Float64, i) # non-constant local i prevents hoist
       end
  0.852338 seconds

julia> @time for i in 1:100_000_000
         foo(i) # non-constant local i prevents hoist
       end
  0.848491 seconds

Weirdly, in the other thread I linked earlier, this approach corroborated the @btime difference instead of the matching @code_native/@code_llvm. So maybe there should be issues in both base Julia and BenchmarkTools to figure out what is going on.