Allocations with BlockArrays multiplication

I’m trying to do a vector matrix multiplication and store the result in a block vector from the BlockArrays package. But I ended up with a lot of allocations. Any ideas why?

using BlockArrays, BenchmarkTools, LinearAlgebra
A = rand(9)
B = rand(19, 9)
C = mortar([rand(10), rand(9)])

@btime mul!($C, $B, $A)
  20.583 μs (342 allocations: 8.02 KiB)

This is perhaps hitting some fallback method. Could you use a Vector in mul!, and subsequently create the BlockVector from that? This will be much faster, albeit with a few allocations. Something like

julia> C = zeros(19);

julia> mul!(C, B, A);

julia> mortar([view(C, 1:10), view(C, 11:19)])

or

mortar([C[1:10], C[11:19]])

depending on your use case