Reading from socket causes massive amounts of memory allocation

Reading from network sockets appears to cause massive amounts of memory allocation.

Does anyone know why? Is this a bug?

raw_data = Vector{UInt8}(undef, message_length)
read!(socket, raw_data)

What I find strange is that the object into which data is written is preallocated with a known length. So there shouldn’t be any allocation taking place here.

What I see is that the number of allocations is approximately the same as the number of elements.

It may be difficult to reproduce this. I can’t provide the original code in its entirety. I have been able to reproduce it by writing small chunks of data to a socket, while the reading end is waiting for a much larger block of memory to complete.

Good numbers to use are an array of 1 million integers.

using Sockets
sock = Sockets.connect(IPv4("127.0.0.1"), 22222)
v = Vector{Int64}(undef, 1000000)
read!(sock, v)

Sending side:

using Sockets
server = Sockets.listen(IPv4("127.0.0.1"), 22222)
socket = accept(server)
for i in 1:1000
    v = Vector{Int64}(undef, 1000)
    write(socket, v)
end

Set the receiving side to run first, then trigger the sending side.

I think that should be enough to reproduce a similar behaviour.

As far as I am aware, I am not doing something “weird” here. (The documentation on sockets is pretty minimal, so it’s hard to know.)

Perhaps this is just not an idiomatic way to use sockets, and that is the cause of the problem. I have no idea.

I am using --track-allocation=user to track the memory allocation. Here is one line from one of my .mem files.

25530368     raw_data = Vector{UInt8}(undef, msglen - bytes_already_read)

That’s 25 million allocations to read ~ 25 MB of data.

Just some additional tests. This is from julia --track-allocation=user, calling Profile.clear_malloc_data() after running each function once to trigger the JIT, and then calling each function again to obtain measurements.

        - function example()
        0     v = Vector{Int64}(undef, 1000000)
        0     return v
        - end
        -
        - function example2(socket)
  8000072     v = Vector{Int64}(undef, 1000000)
        0     read!(socket, v)
        0     return v
        - end

Isn’t that the number of bytes allocated?

Is it? I don’t know. I would have thought it was more sensible to track the number of allocations and not the quantity of data allocated.

Allocating 25 million arrays of length 1 byte will have a significant impact on performance. Allocating a single array of 25 million bytes will have close to no impact on performance.

I’m not sure it’s the same problem, but some time ago I tried without success to find a function in Sockets to read UDP packets which didn’t allocate each time it received data. All the receive functions seem to allocate a new array for each set of data received. See How to pass a buffer to a C callback function? . I did manage to hack a recv_into! function but in the end I wrote the data access in C++ and called it via CxxWrap.

I did try creating a feature request https://github.com/issues/created?issue=JuliaLang|julia|57029

The reason I originally asked about this is because I was seeing slow GC times when repeatedly calling a function which reads data from a socket.

I thought the problem was obvious - allocating 25 M elements on the heap when reading that number of bytes from the socket - assuming it was a bug.

However, if the output of the --trace-allocation is bytes and not number of allocations, then there is no bug and the GC is just extremely slow.

eg: ~ 50% of the program runtime is just spent doing GC. This is crazy.

In my case I tried to avoid allocations as GC temporarily halted all the threads and I lost data. I don’t know much about UDP/TCP comms but allocating on each read seems unnecessary to me.

It is certainly unnecessary in the context of

read!(socket, buffer)

because this (should - as far as I know) overwrite the contents of buffer with the data from socket until enough bytes have been read to fill buffer.

You’d think so, but read! isn’t defined in Sockets. This is well beyond my knowledge, but Base.read! seems to use unsafe_read and the code for that has the comment “It is recommended that subtypes T<:IO override the following method signature
to provide more efficient implementations:” Sockets doesn’t seem to provide any such override. Maybe there’s something missing from Sockets that I don’t really understand?

I get only a few allocations:

The sender:

using Sockets
server = Sockets.listen(IPv4("127.0.0.1"), 22222)
socket = accept(server)

v = Vector{Int64}(undef, 1000)

for i in 1:1000
    write(socket, v)
end

The receiver:

using Sockets
sock = Sockets.connect(IPv4("127.0.0.1"), 22222)
v = Vector{Int64}(undef, 1000000)
julia> @time read!(sock, v);
  0.001464 seconds (9 allocations: 224 bytes)

The output of this doesn’t make sense to me.

9 allocations, and 224 bytes allocated? To read an array of 8 MB ?

The @time read!(sock, v) reports allocations for the read!. The allocation of the vector is in the line before: v = Vector{Int64}(undef, 1000000).

Why read! allocates at all, I don’t know.

Ok thanks. Very strange.

This is the result I see:

julia> @time read!(sock, v)
  0.038267 seconds (162 allocations: 2.609 KiB)

Why would my result be significantly different to yours?