Bug ID 1203603
Summary UDP throughput on 100 Gbit NIC low compared to TCP
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware x86-64
OS openSUSE Tumbleweed
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee jwiesner@suse.com
Reporter jwiesner@suse.com
QA Contact qa-bugs@suse.de
CC yousaf.kaukab@suse.com
Found By ---
Blocker ---

Iperf3 was used for benchmarking on simba2.arch.suse.cz and
simba3.arch.suse.cz. Both machines have 64 logical CPUs, Intel Xeon Gold 6326
CPU @ 2.90GHz, assigned to 2 NUMA nodes (NUMA node1 CPUs: 16-31,48-63) and a
100 Gbit NIC, eth0, that is local to NUMA node 1 (see below). TCP throughput
results depend on whether both the transmitting and receiving iperf3 process
run on NUMA node 1. If they do the resulting throughput approaches the maximum
throughput allowed by the physical layer (95 Gbit/s):
> simba3:~/:[1]# taskset 0xffff0000 /root/jwiesner/iperf/src/iperf3 -c 10.100.128.66 -P1 -f m -b 0 -t 180 -i 5 -O 2
> Connecting to host 10.100.128.66, port 5201
> [  5] local 10.100.128.68 port 54800 connected to 10.100.128.66 port 5201
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-5.00   sec  54.2 GBytes  93056 Mbits/sec    0   3.01 MBytes
> [  5]   5.00-10.00  sec  54.2 GBytes  93079 Mbits/sec    0   3.01 MBytes
> [  5]  10.00-15.00  sec  54.7 GBytes  93892 Mbits/sec    0   3.01 MBytes
> [  5]  15.00-20.00  sec  54.2 GBytes  93083 Mbits/sec    0   3.01 MBytes
Mpstat output on simba3, which ran the client process:
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> 18    0.00    0.00    0.10    0.10    0.00    0.00    0.00    0.00    0.00   99.80
> 19    0.00    0.00    0.00    0.00    0.00   36.98    0.00    0.00    0.00   63.02
> 20    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
> 29    0.10    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.90
> 30    1.01    0.00   82.41    0.00    0.00    0.00    0.00    0.00    0.00   16.58
> 31    0.10    0.00    0.20    0.00    0.00    0.00    0.00    0.00    0.00   99.70
> 59    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
> 60    0.00    0.00    0.10    0.00    0.00   26.59    0.00    0.00    0.00   73.31
> 61    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
The iperf3 client ran on CPU 30, softirq processing (mostly receiving TCP ACKs)
ran on CPUs 19 and 60. Mpstat output on simba2, which ran the server process:
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> 17    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
> 18    2.01    0.00   96.18    0.00    0.00    0.00    0.00    0.00    0.00    1.81
> 19    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
> 26    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
> 27    0.00    0.00    0.00    0.00    0.00   99.12    0.00    0.00    0.00    0.88
> 28    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
The iperf3 client ran on CPU 18, softirq processing ran on CPU 27. The CPU
running softirq processing on the server is the bottleneck.

In contrast to the TCP results, UDP throughput is more than 8 times lower under
default settings:
> simba3:~/:[0]# taskset 0xffff0000 /root/jwiesner/iperf/src/iperf3 -u -c 10.100.128.66 -P1 -f m -b 0 -t 180 -i 5 -O 2
> Connecting to host 10.100.128.66, port 5201
> [  5] local 10.100.128.68 port 44747 connected to 10.100.128.66 port 5201
> [ ID] Interval           Transfer     Bitrate         Total Datagrams
> [  5]   0.00-5.00   sec  6.21 GBytes  10664 Mbits/sec  5804050
> [  5]   5.00-10.00  sec  6.25 GBytes  10733 Mbits/sec  4632580
> [  5]  10.00-15.00  sec  6.24 GBytes  10721 Mbits/sec  4627310
> [  5]  15.00-20.00  sec  6.25 GBytes  10730 Mbits/sec  4631300
Mpstat output on simba3, which ran the client process:
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> 16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
> 17   11.60    0.00   88.40    0.00    0.00    0.00    0.00    0.00    0.00    0.00
> 18    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
> 23    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
> 24    0.00    0.00    0.00    0.00    0.00   27.67    0.00    0.00    0.00   72.33
> 25    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
The iperf3 client ran on CPU 17, softirq processing (freeing transmitted
buffers) ran on CPUs 24. Mpstat output on simba2, which ran the server process:
> CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> 19    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
> 20   12.44    0.00   86.96    0.00    0.00    0.00    0.00    0.00    0.00    0.60
> 21    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
> 49    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
> 50    0.00    0.00    0.22    0.00    0.00   96.12    0.00    0.00    0.00    3.66
> 51    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
The iperf3 client ran on CPU 20, softirq processing ran on CPU 50. The CPU
running iperf3 were the bottleneck, with the CPU running softirq processing on
the simba2 being a close second.

The NUMA node locality and SMP affinity of the interrupts of the 100 Gbit NIC:
> simba3:~/:[0]# for i in $(awk '/eth0/{sub(":", "", $1); print $1}' /proc/interrupts); do grep -rH . /proc/irq/$i/{node,smp_affinity}; done
> /proc/irq/284/node:1
> /proc/irq/284/smp_affinity:00040000,00000000
> /proc/irq/285/node:1
> /proc/irq/285/smp_affinity:00000000,00020000
> /proc/irq/286/node:1
> /proc/irq/286/smp_affinity:00200000,00000000
> /proc/irq/287/node:1
> /proc/irq/287/smp_affinity:00080000,00000000
> /proc/irq/288/node:1
> /proc/irq/288/smp_affinity:04000000,00000000
> /proc/irq/289/node:1
> /proc/irq/289/smp_affinity:00800000,00000000
> /proc/irq/290/node:1
> /proc/irq/290/smp_affinity:00200000,00000000
> /proc/irq/291/node:1
> /proc/irq/291/smp_affinity:00000000,40000000
> /proc/irq/292/node:1
> /proc/irq/292/smp_affinity:10000000,00000000
> /proc/irq/293/node:1
> /proc/irq/293/smp_affinity:00000000,00100000
> /proc/irq/294/node:1
> /proc/irq/294/smp_affinity:00000000,00200000
> /proc/irq/295/node:1
> /proc/irq/295/smp_affinity:20000000,00000000
> /proc/irq/296/node:1
> /proc/irq/296/smp_affinity:00000000,04000000
> /proc/irq/297/node:1
> /proc/irq/297/smp_affinity:00000000,00800000
> /proc/irq/298/node:1
> /proc/irq/298/smp_affinity:00000000,01000000
> /proc/irq/299/node:1
> /proc/irq/299/smp_affinity:80000000,00000000


You are receiving this mail because: