Why do `$`s in BenchmarkTools change allocations so dramatically?

EDIT: PLEASE IGNORE THIS ENTIRE THREAD.
I will start another one.

Weird. I have a copy of vectors.

using LinearAlgebra
using BenchmarkTools

N = 10
U1 = rand(N)
U0 = rand(N)

@btime copyto!(U1, U0)  
@btime $U1 .= $U0  
@btime @. U1 = U0  

The output is:

  19.238 ns (0 allocations: 0 bytes)                                                   
  7.407 ns (0 allocations: 0 bytes)                                                    
  254.743 ns (2 allocations: 32 bytes)   

I don’t know the answer to these questions. Do you?

  1. In the first and second case there is no allocation. Why is the first operation more than twice as
    expensive?

  2. The difference between the second and third case is just the quoting. Why does the third case allocate? And why 32 bytes?

Edit: With Julia 1.7.1. Haven’t tried 1.6.5 yet.
Edit 2: Same results with 1.6.5.

I guess, you forgot the interpolation ($ signs):

@btime copyto!($U1, $U0)  
@btime $U1 .= $U0  
@btime @. $U1 = $U0
5.749 ns (0 allocations: 0 bytes)
7.479 ns (0 allocations: 0 bytes)
7.266 ns (0 allocations: 0 bytes)
3 Likes

Sorry, I said quotation. I meant interpolation. And that was the point. No interpolation: why allocation? And a huge hit in performance.

And why is your copyto! so much faster? Oh, I see interpolation again? But why?

@btime copyto!(U1, U0)
@btime copyto!($U1, $U0)  
@btime $U1 .= $U0
@btime @. U1 = U0

gives

  26.533 ns (0 allocations: 0 bytes)
  8.408 ns (0 allocations: 0 bytes)
  11.712 ns (0 allocations: 0 bytes)
  342.593 ns (2 allocations: 32 bytes)

Ah, I think I forgot about the “global” character of variables not interpolated.
But still: I am passing them into a function, where they will be happily local.

Sorry, I was trying to construct a MWE of a puzzling situation, but I don’t think I managed to capture it with this. I will try again later.

Maybe the machinery of the @btime macro itself uses these variables.
They have some additional information in the readme for the BenchmarkTools:
https://github.com/JuliaCI/BenchmarkTools.jl#benchmarktoolsjl

The example in the readme does not quite explain the effect of using a global variable:

using LinearAlgebra
using BenchmarkTools

A = rand(3, 3);

@btime inv($A);
@btime inv(A);
@btime inv($(rand(3, 3)));
@btime inv(rand(3, 3));

nothing

Yields

  722.656 ns (4 allocations: 1.86 KiB)
  746.087 ns (4 allocations: 1.86 KiB)
  723.438 ns (4 allocations: 1.86 KiB)
  871.429 ns (5 allocations: 1.98 KiB)

So, not much of a difference between A interpolated or not. So, why the huge difference for copyto!?

Edit: For larger vectors the difference between @btime copyto!(U1, U0) and @btime copyto!($U1, $U0) disappears. Still: why is there a difference for small vectors?

1 Like

The thing I am trying to wrap my head around is that I observed that a broadcast to combine three vectors would allocate in a loop:

@time for step in 1:nsteps
        @. U1 = U0 + dt*V0 + ((dt^2)/2)*A0; 
end

yields

  3.080494 seconds (130.00 k allocations: 4.883 MiB)    

The number of allocations grows linearly with the upper bound of the range (nsteps).
The code is not global, the loop exists inside of function.

The problem is that this doesn’t show in a trivial code, only in my original code.

Recall that the loop refers to the variable dt.
The trigger is this: If I compute omega_max from the solution of an eigenvalue problem, the loop would allocate.

    evals, evecs, nconv = eigs(K, M; nev=1, which=:LM, explicittransform=:none)
    @show omega_max = sqrt(evals[1])
    # omega_max = 2.76450e+06 
    @show dt = Float64(0.99* 2/omega_max)
    @show typeof(dt)

On the other hand, if I set omega_max to a fixed value, the loop does not allocate.

    evals, evecs, nconv = eigs(K, M; nev=1, which=:LM, explicittransform=:none)
    # @show omega_max = sqrt(evals[1])
    omega_max = 2.76450e+06 
    @show dt = Float64(0.99* 2/omega_max)
    @show typeof(dt)

Did I mention that it was weird?

1 Like

All the code above is actually inside of a function. There is no global code.

So I know you’ve found this thread to be a red-herring on your way to the more complicated issue you’re trying to debug, but there is a pretty straightforward answer to how BenchmarkTools uses that $ syntax. It simply constructs its benchmarking loop such that anything that’s $'ed is an argument to the function. Everything else is considered a global. How big of an effect this has is entirely dependent upon what Julia is doing.

In general, you want to make sure all function-local variables in your original example have $'s on them in their benchmarks.

2 Likes

Still, copyto! is a function called with arguments that are known to the compiler. Whether it is interpolated or not.

The arguments are globals. The compiler knows they are globals. It has no way of knowing whether those globals are mutated inside copyto! or some ancillary function. Therefore each time it calls copyto! in the benchmark loop it must check the type of the global inputs and dispatch dynamically.

Interpolating tells the compiler the value of the input, rather than the global variable that points to that value.

3 Likes

Ah, that makes sense. Ta.