Allocations with write(::TCPSocket, x)

I am investigating the possibility of using TCPSockets without any dynamic allocations.

It seems that plain

@allocated write(soc::TCPSocket, x)

always allocates (after compilation). --track-allocation=all points to wait()::Cint and ccall(:jl_switch, Cvoid, ()) inside it.
Digging a bit further I found that actual change in number of allocated bytes tracked by garbage collector (jl_gc_get_total_bytes()) happens inside uv_run(loop, UV_RUN_ONCE) in jl_task_get_next(). From there I tried to narrow down sources of allocations but without any success.

Now, on the other hand, if I use uv_write() directly like this:

function write_cb(req::Ptr{Cvoid}, status::Cint)::Nothing

global const cb = @cfunction(write_cb, Cvoid, (Ptr{Cvoid}, Cint))

function write(s::TCPSocket, x::Vector{UInt8})
    global cb
    p = pointer(x)
    n = UInt64(length(x))

    # Base.check_open(s)

    while !iszero(n)
        uvw = Libc.malloc(Base._sizeof_uv_write)
        nwrite::UInt64 = min(n, Base.MAX_OS_WRITE) # split up the write into chunks the OS can handle.
        # TODO: use writev instead of a loop
        err = ccall(:jl_uv_write,
                    (Ptr{Cvoid}, Ptr{Cvoid}, UInt, Ptr{Cvoid}, Ptr{Cvoid}),
                    s, p, nwrite, uvw,
        if err < 0
            Base.uv_error("write", err)

        n -= nwrite

    ccall(:jl_process_events, Cint, ())

than I get zero allocations from @allocated and actual RAM usage does not grow over time when sending millions of small vectors.

But the problem with that is that I do not understand consequences of calling jl_process_events at the very end of write() and how it would interact with the rest of the julia runtime? Is it safe to do it at all?

And the big question for me still is: why write(::TCPSocket, x) dynamically allocates?