Unexpected allocations when passing types as arguments

I have come across some somewhat unexpected behavior when passing types as arguments to a function. For reference, I am using the current Julia 1.10.10 (the current LTS). I am somewhat unclear what causes this behavior, so if anyone has any idea, I would love to know. This seems to have something to do with the JIT compiler, but I don’t know exactly what.

Consider the following minimal example code, which is just allocating a random array of all 0 and 1 values, then flipping the bits in place randomly (to avoid the compiler optimizing away my loop). This test_function_1 does not allocate when called (i.e. the return value, a == 0).

function flip_bit!(state::Vector{T}, ind) where T
    state[ind] = one(T) - state[ind]
end

function test_function_1(n::Integer; num_flips=1000)
    state = rand((0, 1), n) 
    randinds = rand(1:n, num_flips)
    a = @allocations for ind in randinds
        flip_bit!(state, ind)        
    end
    return a
end

However, if I pass a type as a parameter, and make the initial array as follows in a new function, which I call test_function_2:

function test_function_2(n::Integer; num_flips=1000, T=Int64)
    state = rand((zero(T), one(T)), n) 
    randinds = rand(1:n, num_flips)
    a = @allocations for ind in randinds
        flip_bit!(state, ind)        
    end
    return a
end

This function will not allocate when called with T=Int64, however, if you pass the type as a floating point number, it will allocate (i.e. test_function_2(10, T=Float64) will allocate for each loop iteration give or take a few allocations for initial compilation). Note that we could also have written state = T.(rand((0, 1), n)) and the behavior would be the same. Also note that this behavior is not caused by the loop, as we could remove flip_bit! from the loop and it would still allocate once.

This behavior does not occur however if the type is inferred from a parameter passed into the function. Observe this third test function:

function test_function_3(n::Integer; num_flips=1000, dummy::T=0) where T
    state = rand((zero(T), one(T)), n) 
    randinds = rand(1:n, num_flips)
    a = @allocations for ind in randinds
        flip_bit!(state, ind)        
    end
    return a
end

This function doesn’t allocate when called with dummy=0.0. Does anyone more enlightened than myself know what is going on here?

Example behavior from the REPL is shown below, where I have defined the functions in a file called example.jl

julia> include("example.jl");

julia> test_function_1(10)
0

julia> test_function_2(10)
26

julia> test_function_2(10)
0

julia> test_function_2(10; T=Int32)
764

julia> test_function_2(10; T=Int32)
0

julia> test_function_2(10; T=Float64)
1764

julia> test_function_2(10; T=Float64)
1000

julia> test_function_3(10)
0

julia> test_function_3(10; dummy=0.0)
0

Any insights would be greatly appreciated, thank you! If this is a duplicate to a previous post, I apologize - I tried to find one and didn’t.

you want

function test_function_2(n::Integer; num_flips=1000, T::Type{TT} = Int64) where TT
    # use TT here, DO NOT USE T
end
3 Likes

Interesting, that seems to work. Thank you! Do you have any intuition for why this only occurs on floating point values rather than integers?

idk why there’s a difference between fp and integers, it’s basically the application of this Performance Tip:

1 Like

oh actually, it might be because Int was your default argument.

1 Like

I don’t think so. When i was initially making the example code, I was using Float64 by default, and it didn’t specialize in a manner that eliminated allocations.

This is not what I observe:

julia> function flip_bit!(state::Vector{T}, ind) where T
          return
       end
flip_bit! (generic function with 1 method)

julia> test_function_2(10; T=Int64)
0

julia> test_function_2(10; T=Float64)
0

In any case, my guess is that there is a bug in the introspection causing it to underreport the allocations (type instability in getindex or setindex!) from the Int64 situation. @code_warntype and @code_llvm give the same code for T=Int64 and for T=Float64 (except for i64 and double), so I would not expect any difference. But note that test_function_2(10; T=Int64) is slow. Indeed, below test_function_2i64 and test_function_2f64 have hard-coded T=Int64, resp. T=Float64, and test_function_2TT is @jling’s version. flip_bit! has been restored to the original code.

julia> @btime test_function_2(10; num_flips=10000, T=Int64);
  214.800 μs (6 allocations: 78.37 KiB)

julia> @btime test_function_2i64(10; num_flips=10000);
  41.300 μs (5 allocations: 78.33 KiB)

julia> @btime test_function_2TT(10; num_flips=10000, T=Int64);
  41.300 μs (5 allocations: 78.33 KiB)

julia> @btime test_function_2(10; num_flips=10000, T=Float64);  # only a bit slower than for T=Int64
  261.600 μs (10006 allocations: 234.62 KiB)

julia> @btime test_function_2f64(10; num_flips=10000);
  41.300 μs (5 allocations: 78.33 KiB)

julia> @btime test_function_2TT(10; num_flips=10000, T=Int64);
  41.300 μs (5 allocations: 78.33 KiB)

Note that these macros will show fully specialized code that is not always what is actually used #32834 because of the above-mentioned performance tip regarding non-specialization, so take them with a grain of salt.

Tangent: Be less aware of when Julia avoids specializing? and #51423.

3 Likes

I see that the wording that I had initially used was somewhat inaccurate when I said "we could remove flip_bit! from the loop. By that I simply meant that it was not the result of the flip_bit! function being placed into the loop scope.

In other words test_function_2 would have been defined as

function test_function_2(n::Integer; num_flips=1000, T=Int64)
    state = rand((zero(T), one(T)), n) 
    ind = rand(1:n)
    a = @allocations flip_bit!(state, ind)
    return a
end

I had felt the need to add this clarification in my original question since the reason I initially saw the behavior mentioned here was that in a more complicated multi-threaded program I have been working on, I was seeing allocations beyond just those associated with starting threads in a for loop using the @threads macro that didn’t occur in a regular for loop (seemingly in a similar/the same vein as the known issue with function closures). I came across this while trying to distill that down to a minimal code example. I had also wanted to emphasize that the behavior was not coming from local scope in the for loop in some way.

Thanks for the timing comparison. There may be instances in my code where I don’t pass the type parameter information correctly, so I’ll be curious if I can get any speed improvements. I wouldn’t have made the connection that it was possibly under-reporting in the Int64 case. That is wild.

Thank you for your input.