[Bug 1203603] New: UDP throughput on 100 Gbit NIC low compared to TCP
https://bugzilla.suse.com/show_bug.cgi?id=1203603 Bug ID: 1203603 Summary: UDP throughput on 100 Gbit NIC low compared to TCP Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: openSUSE Tumbleweed Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: jwiesner@suse.com Reporter: jwiesner@suse.com QA Contact: qa-bugs@suse.de CC: yousaf.kaukab@suse.com Found By: --- Blocker: --- Iperf3 was used for benchmarking on simba2.arch.suse.cz and simba3.arch.suse.cz. Both machines have 64 logical CPUs, Intel Xeon Gold 6326 CPU @ 2.90GHz, assigned to 2 NUMA nodes (NUMA node1 CPUs: 16-31,48-63) and a 100 Gbit NIC, eth0, that is local to NUMA node 1 (see below). TCP throughput results depend on whether both the transmitting and receiving iperf3 process run on NUMA node 1. If they do the resulting throughput approaches the maximum throughput allowed by the physical layer (95 Gbit/s):
simba3:~/:[1]# taskset 0xffff0000 /root/jwiesner/iperf/src/iperf3 -c 10.100.128.66 -P1 -f m -b 0 -t 180 -i 5 -O 2 Connecting to host 10.100.128.66, port 5201 [ 5] local 10.100.128.68 port 54800 connected to 10.100.128.66 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-5.00 sec 54.2 GBytes 93056 Mbits/sec 0 3.01 MBytes [ 5] 5.00-10.00 sec 54.2 GBytes 93079 Mbits/sec 0 3.01 MBytes [ 5] 10.00-15.00 sec 54.7 GBytes 93892 Mbits/sec 0 3.01 MBytes [ 5] 15.00-20.00 sec 54.2 GBytes 93083 Mbits/sec 0 3.01 MBytes Mpstat output on simba3, which ran the client process: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 18 0.00 0.00 0.10 0.10 0.00 0.00 0.00 0.00 0.00 99.80 19 0.00 0.00 0.00 0.00 0.00 36.98 0.00 0.00 0.00 63.02 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 29 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.90 30 1.01 0.00 82.41 0.00 0.00 0.00 0.00 0.00 0.00 16.58 31 0.10 0.00 0.20 0.00 0.00 0.00 0.00 0.00 0.00 99.70 59 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 60 0.00 0.00 0.10 0.00 0.00 26.59 0.00 0.00 0.00 73.31 61 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 The iperf3 client ran on CPU 30, softirq processing (mostly receiving TCP ACKs) ran on CPUs 19 and 60. Mpstat output on simba2, which ran the server process: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 18 2.01 0.00 96.18 0.00 0.00 0.00 0.00 0.00 0.00 1.81 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 26 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 27 0.00 0.00 0.00 0.00 0.00 99.12 0.00 0.00 0.00 0.88 28 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 The iperf3 client ran on CPU 18, softirq processing ran on CPU 27. The CPU running softirq processing on the server is the bottleneck.
simba3:~/:[0]# taskset 0xffff0000 /root/jwiesner/iperf/src/iperf3 -u -c 10.100.128.66 -P1 -f m -b 0 -t 180 -i 5 -O 2 Connecting to host 10.100.128.66, port 5201 [ 5] local 10.100.128.68 port 44747 connected to 10.100.128.66 port 5201 [ ID] Interval Transfer Bitrate Total Datagrams [ 5] 0.00-5.00 sec 6.21 GBytes 10664 Mbits/sec 5804050 [ 5] 5.00-10.00 sec 6.25 GBytes 10733 Mbits/sec 4632580 [ 5] 10.00-15.00 sec 6.24 GBytes 10721 Mbits/sec 4627310 [ 5] 15.00-20.00 sec 6.25 GBytes 10730 Mbits/sec 4631300 Mpstat output on simba3, which ran the client process: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 17 11.60 0.00 88.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 24 0.00 0.00 0.00 0.00 0.00 27.67 0.00 0.00 0.00 72.33 25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 The iperf3 client ran on CPU 17, softirq processing (freeing transmitted buffers) ran on CPUs 24. Mpstat output on simba2, which ran the server process: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 20 12.44 0.00 86.96 0.00 0.00 0.00 0.00 0.00 0.00 0.60 21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 49 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 50 0.00 0.00 0.22 0.00 0.00 96.12 0.00 0.00 0.00 3.66 51 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 The iperf3 client ran on CPU 20, softirq processing ran on CPU 50. The CPU running iperf3 were the bottleneck, with the CPU running softirq processing on
In contrast to the TCP results, UDP throughput is more than 8 times lower under default settings: the simba2 being a close second. The NUMA node locality and SMP affinity of the interrupts of the 100 Gbit NIC:
simba3:~/:[0]# for i in $(awk '/eth0/{sub(":", "", $1); print $1}' /proc/interrupts); do grep -rH . /proc/irq/$i/{node,smp_affinity}; done /proc/irq/284/node:1 /proc/irq/284/smp_affinity:00040000,00000000 /proc/irq/285/node:1 /proc/irq/285/smp_affinity:00000000,00020000 /proc/irq/286/node:1 /proc/irq/286/smp_affinity:00200000,00000000 /proc/irq/287/node:1 /proc/irq/287/smp_affinity:00080000,00000000 /proc/irq/288/node:1 /proc/irq/288/smp_affinity:04000000,00000000 /proc/irq/289/node:1 /proc/irq/289/smp_affinity:00800000,00000000 /proc/irq/290/node:1 /proc/irq/290/smp_affinity:00200000,00000000 /proc/irq/291/node:1 /proc/irq/291/smp_affinity:00000000,40000000 /proc/irq/292/node:1 /proc/irq/292/smp_affinity:10000000,00000000 /proc/irq/293/node:1 /proc/irq/293/smp_affinity:00000000,00100000 /proc/irq/294/node:1 /proc/irq/294/smp_affinity:00000000,00200000 /proc/irq/295/node:1 /proc/irq/295/smp_affinity:20000000,00000000 /proc/irq/296/node:1 /proc/irq/296/smp_affinity:00000000,04000000 /proc/irq/297/node:1 /proc/irq/297/smp_affinity:00000000,00800000 /proc/irq/298/node:1 /proc/irq/298/smp_affinity:00000000,01000000 /proc/irq/299/node:1 /proc/irq/299/smp_affinity:80000000,00000000
-- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1203603 https://bugzilla.suse.com/show_bug.cgi?id=1203603#c1 Jiri Wiesner <jwiesner@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |IN_PROGRESS --- Comment #1 from Jiri Wiesner <jwiesner@suse.com> --- It should be noted that moving both the client and the server iperf3 process to NUMA node 0 causes a performance hit, which is not as pronounced as in the TCP test:
simba3:~/:[1]# taskset 0x0000ffff /root/jwiesner/iperf/src/iperf3 -u -c 10.100.128.66 -P1 -f m -b 0 -t 180 -i 5 -O 2 Connecting to host 10.100.128.66, port 5201 [ 5] local 10.100.128.68 port 42650 connected to 10.100.128.66 port 5201 [ ID] Interval Transfer Bitrate Total Datagrams [ 5] 0.00-5.00 sec 4.15 GBytes 7121 Mbits/sec 3829980 [ 5] 5.00-10.00 sec 4.14 GBytes 7114 Mbits/sec 3070640 [ 5] 10.00-15.00 sec 4.15 GBytes 7129 Mbits/sec 3076920
The MTU of all devices was set to 1500 bytes. This is nstat output for a 10 second period on simba2 for the UDP test:
IpInReceives 7951006 0.0 IpInDelivers 7950996 0.0 IpOutRequests 2 0.0 TcpInSegs 2 0.0 TcpOutSegs 2 0.0 UdpInDatagrams 7078508 0.0 UdpInErrors 872514 0.0 UdpRcvbufErrors 872514 0.0 Ip6InReceives 16 0.0 Ip6InMcastPkts 16 0.0 Ip6InOctets 1422 0.0 Ip6InMcastOctets 1422 0.0 Ip6InNoECTPkts 16 0.0 TcpExtTCPHPAcks 2 0.0 TcpExtTCPAutoCorking 1 0.0 TcpExtTCPOrigDataSent 2 0.0 TcpExtTCPDelivered 2 0.0 IpExtInOctets 11735864062 0.0 IpExtOutOctets 1432 0.0 IpExtInNoECTPkts 7951140 0.0 The same output for the TCP test: IpInReceives 1812332 0.0 IpInDelivers 1812320 0.0 IpOutRequests 496318 0.0 TcpInSegs 1812324 0.0 TcpOutSegs 496319 0.0 Ip6InReceives 13 0.0 Ip6InMcastPkts 13 0.0 Ip6InOctets 1115 0.0 Ip6InMcastOctets 1115 0.0 Ip6InNoECTPkts 13 0.0 TcpExtTCPHPHits 1812243 0.0 TcpExtTCPPureAcks 7 0.0 TcpExtTCPHPAcks 4 0.0 TcpExtTCPAutoCorking 1 0.0 TcpExtTCPOrigDataSent 10 0.0 TcpExtTCPDelivered 11 0.0 IpExtInOctets 116209437656 0.0 IpExtOutOctets 25809812 0.0 IpExtInNoECTPkts 80194342 0.0 The IP layer in the UDP test processes MTU-sized packets: 11735864062 (IpExtInOctets) / 7950996 (IpInDelivers) = 1476 bytes per packet whereas IP layer in the TCP test processes much larger packets approaching the maximum size of an IP packet: 116209437656 (IpExtInOctets) / 1812320 (IpInDelivers) = 64121 bytes per packet
-- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com