Extra allocation with `T::DataType`?

lucifer1004 · August 18, 2022, 2:45pm

function A(T::DataType, m::Integer, n::Integer, s::Integer, θ::Number)
    θ = T(θ)
    half = θ / 2
    cos_half = cos(half)
    sin_half = sin(half)
    kmin = max(0, m - n)
    kmax = min(s + m, s - n)
    sig = (-1)^(kmin & 1)
    ch = cos_half^(2s - 2kmin + m - n)
    sh = sin_half^(2kmin - m + n)
    fac = factorial(kmin) * factorial(s + m - kmin) * factorial(s - n - kmin) *
          factorial(n - m + kmin)
    d = sig * ch * sh / fac
    for k in (kmin + 1):kmax
        sig = -sig
        ch *= cos_half^(-2)
        sh *= sin_half^2
        fac *= k * (n - m + k)
        fac /= (s + m - k + 1) * (s - n - k + 1)
        d += sig * ch * sh / fac
    end

    return d * √(factorial(s + m) * factorial(s - m) * factorial(s + n) * factorial(s - n))
end

function B(::Type{T}, m::Integer, n::Integer, s::Integer, θ::Number) where {T}
    θ = T(θ)
    half = θ / 2
    cos_half = cos(half)
    sin_half = sin(half)
    kmin = max(0, m - n)
    kmax = min(s + m, s - n)
    sig = (-1)^(kmin & 1)
    ch = cos_half^(2s - 2kmin + m - n)
    sh = sin_half^(2kmin - m + n)
    fac = factorial(kmin) * factorial(s + m - kmin) * factorial(s - n - kmin) *
          factorial(n - m + kmin)
    d = sig * ch * sh / fac
    for k in (kmin + 1):kmax
        sig = -sig
        ch *= cos_half^(-2)
        sh *= sin_half^2
        fac *= k * (n - m + k)
        fac /= (s + m - k + 1) * (s - n - k + 1)
        d += sig * ch * sh / fac
    end

    return d * √(factorial(s + m) * factorial(s - m) * factorial(s + n) * factorial(s - n))
end

a = @allocated A(Float64, -2, -2, 5, 0.5)
@info "A allocated $a"

b = @allocated B(Float64, -2, -2, 5, 0.5)
@info "B allocated $b"

The only difference between function A() and B() is the type annotation of the first argument.

However, the two functions have very different allocation behavior:

[ Info: A allocated 560
[ Info: B allocated 0

Further investigation showed that all intermediate variables of type T were allocated to the heap in A(), while other types (e.g., Int) were not affected.

I have tried to make a minimal reproduction, but this behavior did not occur with a simplified function body, so I just kept them as they were.

Sukera · August 18, 2022, 2:52pm

As a heuristic, Julia avoids automatically specializing on argument type parameters in three specific cases: Type , Function , and Vararg . Julia will always specialize when the argument is used within the method, but not if the argument is just passed through to another function. This usually has no performance impact at runtime and improves compiler performance. If you find it does have a performance impact at runtime in your case, you can trigger specialization by adding a type parameter to the method declaration.

https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing

By using ::Type{T}, you’re forcing julia to specialize the code.

mikmoore · August 18, 2022, 2:59pm

Looks like https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing.

lucifer1004 · August 18, 2022, 3:01pm

It does have a great impact on the performance. Benchmarks show that the unspecialized version was 30x slower.

Sukera · August 18, 2022, 3:04pm

Yes, because it didn’t specialize on the argument being a Float64. If julia were to specialize every call for every type that’s passed in, even if it’s passed through, that’d be worse for compile time performance, as explained in the quoted section.

lucifer1004 · August 18, 2022, 3:05pm

But note that there is a θ = T(θ) call in the first line, so T should have been used?

mikmoore · August 18, 2022, 3:07pm

I wouldn’t have expected this to be an issue in your example. I would have expected the line

to force specialization because (according to the linked documentation)

Julia will always specialize when the argument is used within the method, but not if the argument is just passed through to another function.

lucifer1004 · August 18, 2022, 3:08pm

Can this be a bug in the specialization system?

mikmoore · August 18, 2022, 3:47pm

New github issue. It looks like we need to either change the behavior or clarify the documentation.

SteffenPL · August 18, 2022, 3:59pm

For me it doesn’t look like a bug at all.

In function A at compile time the information which is provided is this:
- T is any type of data type available at runtime, but at compile time we do not know which type it is. (Just in the same way as the compiler doesn’t know which values m or n have.) That is why Julia cannot compile different versions for different inputs, but rather it has to compile in a way which works for all possible values of T.
In function B the type information is available at compile time!
- T is avaiable at compile time, so Julia will compile a specialised version for each different type you call it with.

You might know it already, but in general it is counter productive to add too many type annotations, especially for cases like this.
See for example: Performance Tips · The Julia Language

In your case you might write

function A(m, n, s, θ)
    half = θ / 2
    # ...
end

and the runtime will still be optimal! Note also that types like Number and Integer are non-concrete types, e.g. they help with multiple dispatch but not for the performance.

Benny · August 18, 2022, 8:32pm

I think part of the confusion in the latter parts of the thread is how self-referential it is to think about types of types. The type instability (and extra allocations) of T::DataType can make us forget that DataType is a concrete type. In almost every other case, a concrete type annotation would contribute type stability, but it just so happens the instances of DataType are also numerous types. So we have an abstract ::Type{T} to get even more specific than DataType to achieve type stability again; it’s the only example that comes to mind of abstract types that subtype a concrete type e.g. Type{Int} <: DataType, feel free to point out more.

The documentation could use a bit more clarification on this, it is not obvious that annotating T::DataType specializes on the argument type DataType, while annotating T::Type or T::Any will specialize on the argument instance T if T is called:

julia> using MethodAnalysis

julia> struct M x::Int end

julia> struct N x::Int end

julia> f(T, i) = T(i)
f (generic function with 1 method)

julia> f2(T::Type, i) = T(i)
f2 (generic function with 1 method)

julia> g(T::DataType, i) = T(i)
g (generic function with 1 method)

julia> f(M, 1), f2(M, 2), g(M, 3), f(N, 1), f2(N, 2), g(N, 3)
(M(1), M(2), M(3), N(1), N(2), N(3))

julia> methodinstances(f)
2-element Vector{Core.MethodInstance}:
 MethodInstance for f(::Type{M}, ::Int64)
 MethodInstance for f(::Type{N}, ::Int64)

julia> methodinstances(f2)
2-element Vector{Core.MethodInstance}:
 MethodInstance for f2(::Type{M}, ::Int64)
 MethodInstance for f2(::Type{N}, ::Int64)

julia> methodinstances(g)
1-element Vector{Core.MethodInstance}:
 MethodInstance for g(::DataType, ::Int64)

lucifer1004 · August 19, 2022, 12:14am

But in the Github issue by @mikmoore T::Any is not specialized either.

Benny · August 19, 2022, 12:22am

According to MethodAnalysis, it does specialize. I see that unnecessary allocation though, what IS that for?

julia> using BenchmarkTools, MethodAnalysis

julia> foo(T,x) = exp(T(x))
foo (generic function with 1 method)

julia> foo(Float64,10), foo(Int, 10)
(22026.465794806718, 22026.465794806718)

julia> methodinstances(foo)
2-element Vector{Core.MethodInstance}:
 MethodInstance for foo(::Type{Float64}, ::Int64)
 MethodInstance for foo(::Type{Int64}, ::Int64)

julia> @btime foo($Float64,$10)
  127.871 ns (1 allocation: 16 bytes)
22026.465794806718

julia> @btime foo($Int,$10)
  138.402 ns (1 allocation: 16 bytes)
22026.465794806718

Oh, and if anyone else is checking this, make sure to use methodinstances before you use @code_warntype or @btime, those macros will make “specialized method instances” that aren’t actually used by the actual method call for some reason.

lucifer1004 · August 19, 2022, 12:24am

This means there might be two separate issues. One is that ::DataType is not specialized, and the other is that using types as arguments causes unnecessary allocations.

Benny · August 19, 2022, 12:36am

The native code is short and looks identical. I can’t read assembly but whenever there’s type instability or extra allocations I expect way more lines. Last week I also ran into another example where 2 versions of a method had the same @code_native but different @btime. I’m on a Mac so anyone with a different machine should corroborate.

julia> foo(T,x) = exp(T(x))
foo (generic function with 1 method)

julia> foo(x) = exp(Float64(x))
foo (generic function with 2 methods)

julia> @code_native foo(Float64,10)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[1]:1 within `foo`
	subq	$8, %rsp
; │┌ @ float.jl:146 within `Float64`
	vcvtsi2sd	%rdi, %xmm0, %xmm0
; │└
	movabsq	$exp, %rax
	callq	*%rax
	popq	%rax
	retq
	nopw	(%rax,%rax)
; └

julia> @code_native foo(10)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[2]:1 within `foo`
	subq	$8, %rsp
; │┌ @ float.jl:146 within `Float64`
	vcvtsi2sd	%rdi, %xmm0, %xmm0
; │└
	movabsq	$exp, %rax
	callq	*%rax
	popq	%rax
	retq
	nopw	(%rax,%rax)
; └

lucifer1004 · August 19, 2022, 2:03am

One more thing, whether or not interpolate the type, e.g., foo($Float64, $10) and foo(Float64, $10) yield different benchmarks. The former allocates while the latter does not.

julia> @btime foo(Float64, $10)
  10.060 ns (0 allocations: 0 bytes)
22026.465794806718

julia> @btime foo($Float64, $10)
  89.201 ns (1 allocation: 16 bytes)
22026.465794806718

mikmoore · August 19, 2022, 2:10am

Maybe that’s it? The interpolation demotes Float64 from Type{Float64} to DataType in the benchmark somehow and that causes the allocation?

Benny · August 19, 2022, 7:04am

Sounds right. Either @code_native xor @btime is inaccurate, and roughly @timeing a non-hoisting loop with no runtime dispatch seems to corroborate @code_native here: 0 allocations even over 1e8 iterations.

julia> foo(Float64,10), foo(10) # compile first
(22026.465794806718, 22026.465794806718)

julia> @time for i in 1:100_000_000
         foo(Float64, i) # non-constant local i prevents hoist
       end
  0.852338 seconds

julia> @time for i in 1:100_000_000
         foo(i) # non-constant local i prevents hoist
       end
  0.848491 seconds

Weirdly, in the other thread I linked earlier, this approach corroborated the @btime difference instead of the matching @code_native/@code_llvm. So maybe there should be issues in both base Julia and BenchmarkTools to figure out what is going on.

Topic		Replies	Views
Impact of specifying input types on function performance, when referencing arguments from Vector{Any} Performance	16	608	November 10, 2022
Unexpected Allocations in Julia 0.5 General Usage	4	550	December 5, 2016
Mysterious memory allocation Performance	16	851	January 24, 2023
What is the best way to re-use a temporary vector Performance performance , memory-allocation	16	338	October 22, 2024
Performance issue due to function as an argument General Usage question , performance	16	850	September 22, 2023

Extra allocation with `T::DataType`?

Related topics