Why is my own simplified IO type slower than Base.IOBuffer?

I am trying to extend Base to include an IO buffer type, similar to DevNull, which just throws away all data written to it, but keeps track of a pointer, in order to find the serialized size of an object. (as suggested in a response to one of my previous questions on here)

mutable struct BlackHoleBuffer <: IO
    ptr::Int
end

@inline function Base.write(to::BlackHoleBuffer, x::UInt8)
    to.ptr += 1
    return sizeof(UInt8)
end

This is basically just a stripped down version of IOBuffer, so I expected this to be fast to write to, especially for large arrays, but as you can see, writing to BlackHoleBuffer is about 3 times slower than writing to an IOBuffer. Any hints as to why this might be the case?

julia> using BenchmarkTools

julia> A = rand(1000000);

julia> @btime begin
       buf = IOBuffer()
       write(buf, A)
       end
  3.286 ms (5 allocations: 7.63 MiB)
8000000

julia> @btime begin
       buf = BlackHoleBuffer(0)
       write(buf, A)
       end
  10.226 ms (2 allocations: 32 bytes)
8000000

It’s likely due to having to fall back to “writing” singular bytes, instead of being able to write out the whole object. I’d type this as

write(io::BlackholeBuffer, _::T) where T
     size = sizeof(T) 
     io.ptr += size
     return size
end

This will run into some ambiguities, due to writing to IO having the fallbacks mentioned above. You’ll want to define these (e.g. for Base.BitInteger etc) for your type explicitly (see here).