Substantial increase in time in copying an instantiated broadcasted object vs a non-instantiated one

I’m trying to understand the difference in performance here:

julia> using LinearAlgebra

julia> n=40; U = UpperTriangular(rand(n,n)); C = similar(U); B = Broadcast.broadcasted(*, 2.0, U);

julia> @btime copyto!($C, $B);
  307.138 ns (0 allocations: 0 bytes)

julia> B2 = Broadcast.instantiate(B);

julia> @btime copyto!($C, $B2);
  483.149 ns (0 allocations: 0 bytes)

Why is it substantially more expensive to copy the instantiated object? This is a small difference that doesn’t scale with matrix size, but I’m trying to understand why this exists at all. I don’t observe a similar difference if Arrays are involved.

This is on the current master.

Cannot quite reproduce the drastic difference on 1.10:

julia> using LinearAlgebra, BenchmarkTools

julia> n=40; U = UpperTriangular(rand(n,n)); C = similar(U); B = Broadcast.broadcasted(*, 2.0, U);

julia> @btime copyto!($C, $B);
  303.206 ns (0 allocations: 0 bytes)

julia> B2 = Broadcast.instantiate(B);

julia> @btime copyto!($C, $B2);
  336.258 ns (0 allocations: 0 bytes)

Yes, the difference is more pronounced on the current master than on v1.10