ReinterpretedArray Performance (even worse on 1.8)

Concretely the way I want to use this is to have temporary arrays for cases where I can predict their eltype at evaluation time, but not at the time I’m constructing the object where the temporary array is held. So very crudely, I would do something like this:

mutable struct A
    tmp::Vector{UInt8} 
    # other stuff
end 

function evaluate(a::A, x::TX)
    T = predict_eltype(TX)
    # resize a.tmp if necessary, or replace with larger array
    tmp = reinterpret(T, a.tmp)
    # ... do some computations in tmp 
    # return output
end

But then I noticed that the reinterpret sometimes causes a significant slow-down.

I should also acknowledge that the speedup I’m getting by not allocating is not at all clear, sometimes I seem to gain quite a bit, sometimes nothing. Maybe it depends on whether the GC can figure out that I want to reuse my arrays and keeps them around? (Basically that’s the functionality I’m trying to re-implement here…)