Reading from socket causes massive amounts of memory allocation

world-peace · October 16, 2025, 1:42pm

Reading from network sockets appears to cause massive amounts of memory allocation.

Does anyone know why? Is this a bug?

raw_data = Vector{UInt8}(undef, message_length)
read!(socket, raw_data)

What I find strange is that the object into which data is written is preallocated with a known length. So there shouldn’t be any allocation taking place here.

What I see is that the number of allocations is approximately the same as the number of elements.

It may be difficult to reproduce this. I can’t provide the original code in its entirety. I have been able to reproduce it by writing small chunks of data to a socket, while the reading end is waiting for a much larger block of memory to complete.

Good numbers to use are an array of 1 million integers.

using Sockets
sock = Sockets.connect(IPv4("127.0.0.1"), 22222)
v = Vector{Int64}(undef, 1000000)
read!(sock, v)

Sending side:

using Sockets
server = Sockets.listen(IPv4("127.0.0.1"), 22222)
socket = accept(server)
for i in 1:1000
    v = Vector{Int64}(undef, 1000)
    write(socket, v)
end

Set the receiving side to run first, then trigger the sending side.

I think that should be enough to reproduce a similar behaviour.

As far as I am aware, I am not doing something “weird” here. (The documentation on sockets is pretty minimal, so it’s hard to know.)

Perhaps this is just not an idiomatic way to use sockets, and that is the cause of the problem. I have no idea.

I am using --track-allocation=user to track the memory allocation. Here is one line from one of my .mem files.

25530368     raw_data = Vector{UInt8}(undef, msglen - bytes_already_read)

That’s 25 million allocations to read ~ 25 MB of data.

Just some additional tests. This is from julia --track-allocation=user, calling Profile.clear_malloc_data() after running each function once to trigger the JIT, and then calling each function again to obtain measurements.

        - function example()
        0     v = Vector{Int64}(undef, 1000000)
        0     return v
        - end
        -
        - function example2(socket)
  8000072     v = Vector{Int64}(undef, 1000000)
        0     read!(socket, v)
        0     return v
        - end

sgaure · October 16, 2025, 5:50pm

world-peace:

I am using --track-allocation=user to track the memory allocation. Here is one line from one of my .mem files.
25530368     raw_data = Vector{UInt8}(undef, msglen - bytes_already_read)
That’s 25 million allocations to read ~ 25 MB of data.

Isn’t that the number of bytes allocated?

world-peace · October 17, 2025, 12:32pm

Is it? I don’t know. I would have thought it was more sensible to track the number of allocations and not the quantity of data allocated.

Allocating 25 million arrays of length 1 byte will have a significant impact on performance. Allocating a single array of 25 million bytes will have close to no impact on performance.

tt1234567 · October 17, 2025, 1:11pm

I’m not sure it’s the same problem, but some time ago I tried without success to find a function in Sockets to read UDP packets which didn’t allocate each time it received data. All the receive functions seem to allocate a new array for each set of data received. See How to pass a buffer to a C callback function? . I did manage to hack a recv_into! function but in the end I wrote the data access in C++ and called it via CxxWrap.

I did try creating a feature request https://github.com/issues/created?issue=JuliaLang|julia|57029

world-peace · October 17, 2025, 1:24pm

The reason I originally asked about this is because I was seeing slow GC times when repeatedly calling a function which reads data from a socket.

I thought the problem was obvious - allocating 25 M elements on the heap when reading that number of bytes from the socket - assuming it was a bug.

However, if the output of the --trace-allocation is bytes and not number of allocations, then there is no bug and the GC is just extremely slow.

eg: ~ 50% of the program runtime is just spent doing GC. This is crazy.

tt1234567 · October 17, 2025, 1:44pm

In my case I tried to avoid allocations as GC temporarily halted all the threads and I lost data. I don’t know much about UDP/TCP comms but allocating on each read seems unnecessary to me.

world-peace · October 17, 2025, 2:15pm

It is certainly unnecessary in the context of

read!(socket, buffer)

because this (should - as far as I know) overwrite the contents of buffer with the data from socket until enough bytes have been read to fill buffer.

tt1234567 · October 17, 2025, 2:32pm

world-peace:

It is certainly unnecessary in the context of
read!(socket, buffer)
because this (should - as far as I know) overwrite the contents of buffer with the data from socket until enough bytes have been read to fill buffer.

You’d think so, but read! isn’t defined in Sockets. This is well beyond my knowledge, but Base.read! seems to use unsafe_read and the code for that has the comment “It is recommended that subtypes T<:IO override the following method signature
to provide more efficient implementations:” Sockets doesn’t seem to provide any such override. Maybe there’s something missing from Sockets that I don’t really understand?

sgaure · October 17, 2025, 2:50pm

I get only a few allocations:

The sender:

using Sockets
server = Sockets.listen(IPv4("127.0.0.1"), 22222)
socket = accept(server)

v = Vector{Int64}(undef, 1000)

for i in 1:1000
    write(socket, v)
end

The receiver:

using Sockets
sock = Sockets.connect(IPv4("127.0.0.1"), 22222)
v = Vector{Int64}(undef, 1000000)
julia> @time read!(sock, v);
  0.001464 seconds (9 allocations: 224 bytes)

world-peace · October 20, 2025, 10:31am

The output of this doesn’t make sense to me.

9 allocations, and 224 bytes allocated? To read an array of 8 MB ?

sgaure · October 20, 2025, 10:45am

The @time read!(sock, v) reports allocations for the read!. The allocation of the vector is in the line before: v = Vector{Int64}(undef, 1000000).

Why read! allocates at all, I don’t know.

world-peace · October 20, 2025, 1:24pm

Ok thanks. Very strange.

world-peace · October 23, 2025, 9:23am

This is the result I see:

julia> @time read!(sock, v)
  0.038267 seconds (162 allocations: 2.609 KiB)

Why would my result be significantly different to yours?

Topic		Replies	Views
Memory allocation problem Performance	6	650	June 2, 2018
Finding the memory allocation in some code General Usage performance	3	882	August 25, 2017
Understanding meanings of memory allocation numbers New to Julia performance , memory-allocation	3	724	June 22, 2018
TCP socket allocations Performance question	0	270	October 5, 2021
Incorrect memory allocation in Julia v0.5 General Usage	27	1812	December 8, 2016

Reading from socket causes massive amounts of memory allocation

Related topics