Bottleneck when receiving UDP packets?

Alright I just tried on our DAQ system and it is indeed related to this setting net.core.rmem_max since now I am able to keep up with the UDP rates of from many sources!

I am however not able to locally reproduce these rates, which means, I am able to receive more than 2000 UPD packets on our target system from many (external) network sources, but I am reproduce this via the local loopback.

Anyways, thanks for point it out, sometimes one just needs to talk about it a bit more :see_no_evil:

what kind of network set up are you using, is there anything more than just a switch between the sender and receiver? QoS or firewall systems could cause the network to actually drop the packets.

We are using a fiber-optic network system with up to 2070 nodes with each having a 1Gbps connection and multiple 10 Gbps DFES uplinks. So the throughput is well handled. This UDP data is just a tiny fraction of the overall data. The server itself where I analyse the UDP packets in realtime is connected with a 10GbE NIC.

I hit the limit of around 1000 UDP packets/s when we attached more nodes to the network (currently we have 114 out of 2070 attached, each sending with a rate of 10Hz), so I started to investiage on my own machines.

However, as written above, increasing the net.core.rmem_max helps.

It is just a bit annoying that you can set higher values through the uv_lib package without any errors and those are just ignored. So I thought I am already using large buffer where instead I was using the same default buffer size all the time.

1000pps is ridiculously low. I have a raspberry pi 4 that is doing QoS on my home network that is easily handling 1Gbps. At 1500Byte MTU, that’s ~83000 packets per second. of course it’s TCP in those tests. but the point is you are definitely nowhere near network hardware or inherent OS limits.

On the other hand, for example, the switching hardware may see DSCP tags on these packets indicating high priority and then deliver them with high priority… but high priority is sometimes limited to a tiny fraction of available bandwidth. so I was considering the idea that QoS was involved.

On linux a time slice is usually around 100ms (not sure what it is on a mac). So if you are sending 1000 packets per second you are basically waking up sending a packet sleeping for a 1ms then sending another packet. Which is probably playing hell with the scheduler.

Something I would try is going into a while loop until it’s time to send the next package. Yes you will will use 100% of a core, but you shouldn’t have much scheduler overhead. A second thing to try is send the packets in bursts of 100 or something, So send 100, sleep for somewhat less than 100ms send the next 100.

1 Like

Yes I see. I was also surprised and it really annoyed me for a few days (that’s why I desperately asked for help here).

I still do not understand why the default settings of my Linux machine are yielding such a “poor performance” and also do not see how the buffer size is that much related. I thought the GC time might drop some packages but not at these ridiculously low rates. I can handle TCP/IP data with Julia in realtime with orders of magnitudes higher rates, but these low UDP packet rates (with a fixed size of 244) simply didn’t want to be processed :wink:

I that makes more sense, I can try that to simulate the traffic on the local loopback. Thanks!

Debian kernels were using 1000Hz timers until a few years ago and then switched to tickless if I remember correctly. but even at old school 100Hz the timeslice is only 10ms not 100.

if the process can run every 10ms then 1000pps * .01s = 10 packets x 244bytes = 2.4kB of buffer. it seems weird that this would be a real limit.

1 Like

Yes I thought it was around 10 or 20ms…but when I googled to be sure:

https://stackoverflow.com/questions/16401294/how-to-know-linux-scheduler-time-slice

They where saying 100ms then digging into:

https://man7.org/linux/man-pages/man2/sched_rr_get_interval.2.html#NOTES

That says the LInux “quantum” is 0.1 seconds. On my machine:

[pixel27@devil ~]$ cat /proc/sys/kernel/sched_rr_timeslice_ms 
90

So maybe 90ms for me? :slight_smile:

that is for the real-time round robin scheduler.

https://stackoverflow.com/questions/16401294/how-to-know-linux-scheduler-time-slice

gives more discussion. most processes will be scheduled on the CFS scheduler. latency target default there is 6ms

so unless you have saturated all the cores with real-time scheduled tasks it would be rare to have more than 10 or 20ms of latency for a well behaved user process (not swapping etc)

1 Like

I am still confused.

While after having set net.core.rmem_max=33554432, I am able to receive and process all the UDP packets (with a rate of > 1000Hz) on our DAQ system (before I had significant loss and only got 600Hz), but I still fail to do so on my own machine.

I set the same net.core.rmem_max value but I still can only receive ~900Hz while sending with a rate of 20kHz (using the scripts above, and also setting the socket buffer size to 33554432).

I don’t see the connection to the scheduler yet :confused:

The machine you’re testing on, is it on the same network as the DAQ system? What network is it on? It sounds like you’re using a mac laptop for testing? Are you on wifi?

The machine I am testing on is completely separated from the DAQ system. I am using 127.0.0.1 to send and receive, so it’s the local loopback device.

Edit: I am literally just running the two scripts above, on the same machine.

!!!

Hunh. clearly that rules out all network hardware, and it does seem like it’s probably a kernel limitation. But are you running Linux or MacOS on this machine? (never mind, I see the screenshot is clearly MacOS). If MacOS hard to know what you could do.

Confirm that with your two original scripts, receiving for 10 seconds I only get 7000 packets or so, on a Linux x86 machine with plenty of rmem_max (50MB)

I tried on both machines. On macOS the receiving limit is around 600Hz, on my Linux around 900Hz.

You can simply try that on your own machine. I really have no clue :confused:

Some useful but linux-specific advice from the Cloudflare blog: How to receive a million packets per second.

On the sending side, I think sleeping after every packet is the issue. the julia sleep function is only accurate to within about 1ms, there’s a usleep function you can Ccall which could be better.

I don’t think this is actually on the receive end, it’s the sending. Just my current guess.

I’d suggest to figure out how many packets you’re supposed to send per 5ms, send those, and then sleep 5ms. yes it’s bursty, but figure even at 100k packets per second, in 5ms we’re only talking 500 packets per loop, and 100k packets per second is crazy high for most purposes. If you’re talking 10k packets per second, you’d be sending 50 packets per time through the loop, then check how long to sleep, then sleep that long, and start over.

EDIT: indeed when I do this:


function send_data(sock, target, port, data, packets_per_second)
    delta_t = 1 / packets_per_second
    stopat = time() + delta_t
    pp5m = round(packets_per_second * 5/1000)
    while true
        now = time()
        for i in 1:pp5m
            send(sock, target, port, data)
        end
        after = time()
        
        pausefor = now + .005 - after
        if pausefor < 0
            # error("Can't keep up with UDP packet rate ($pausefor s)")
            print(".")
            continue
        end
        sleep(pausefor)
    end
end

I have no problem running your sender with 10000 packets per second with no overruns on my linux desktop and the receiver says:

Counting UDP packets for 20 seconds
150601 UDP packets recieved

or about 7500 pps

So it’s not a super accurate method of timing, but basically the issue is that you’re sending one packet and sleeping rather than sleeping less often or calling usleep

@tamasgal

2 Likes

Ah very nice. I was so sure that the sending part is correct, given that I even monitored the time between sending and sleeping and it looked ok.

Thanks :slight_smile:

1 Like

Yeah, for me the clue was that wireshark couldn’t capture the packets. I’ve done packet captures during speed tests at large fractions of a gigabit, so wireshark is capable of tens of thousands of packets per second no issues on my machine. Since you were only seeing a few hundred packets per second, it wasn’t on the receiving end… so then I started looking at the method of sending. Glad we figured it out.

It seems like julia should offer nanosleep: nanosleep(2) - Linux manual page

1 Like

…me looking at the wrong end for days :see_no_evil:

nanosleep sounds interesting indeed