Linux kernel bypass Linux kernel bypass

Unfortunately the speed of vanilla Linux kernel networking is not sufficient for more specialized workloads. For example, here at CloudFlare, we are constantly dealing with large packet floods. Vanilla Linux can do only about 1M pps. This is not enough in our environment, especially since the network cards are capable of handling a much higher throughput. Modern 10Gbps NIC’s can usually process at least 10M pps.

et’s prepare a small experiment to convince you that working around Linux is indeed necessary. Let’s see how many packets can be handled by the kernel under perfect conditions. Passing packets to userspace is costly, so instead let’s try to drop them as soon as they leave the network driver code. To my knowledge the fastest way to drop packets in Linux, without hacking the kernel sources, is by placing a DROP rule in the PREROUTING iptables chain:
$ sudo iptables -t raw -I PREROUTING -p udp –dport 4321 –dst -j DROP
$ sudo ethtool -X eth2 weight 1
$ watch ‘ethtool -S eth2|grep rx’
rx_packets: 12.2m/s
rx-0.rx_packets: 1.4m/s
rx-1.rx_packets: 0/s

Ethtool statistics above show that the network card receives a line rate of 12M packets per second. By manipulating an indirection table on a NIC with ethtool -X, we direct all the packets to RX queue #0. As we can see the kernel is able to process 1.4M pps on that queue with a single CPU.
Processing 1.4M pps on a single core is certainly a very good result, but unfortunately the stack doesn’t scale. When the packets hit many cores the numbers drop sharply. Let’s see the numbers when we direct packets to four RX queues:
$ sudo ethtool -X eth2 weight 1 1 1 1
$ watch ‘ethtool -S eth2|grep rx’
rx_packets: 12.1m/s
rx-0.rx_packets: 477.8k/s
rx-1.rx_packets: 447.5k/s
rx-2.rx_packets: 482.6k/s
rx-3.rx_packets: 455.9k/s
Now we process only 480k pps per core. This is bad news. Even optimistically assuming the performance won’t drop further when adding more cores, we would still need more than 20 CPU’s to handle packets at line rate. So the kernel is not going to work.

Solarflare network cards support OpenOnload, a magical network accelerator. It achieves a kernel bypass by implementing the network stack in userspace and using an LD_PRELOAD to overwrite network syscalls of the target program. For low level access to the network card OpenOnload relies on an “EF_VI” library. This library can be used directly and is well documented.
EF_VI, being a proprietary library, can be only used on Solarflare NIC’s, but you may wonder how it actually works behind the scenes. It turns out EF_VI reuses the usual NIC features in a very smart way.
Under the hood each EF_VI program is granted access to a dedicated RX queue, hidden from the kernel. By default the queue receives no packets, until you create an EF_VI “filter”. This filter is nothing more than a hidden flow steering rule. You won’t see it in ethtool -n, but the rule does in fact exist on the network card. Having allocated an RX queue and managed flow steering rules, the only remaining task for EF_VI is to provide a userspace API for accessing the queue.


Prozessortaktung › Wiki ›

Prozessortaktung › Wiki ›
Beispiel: cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state 1400000 53513 1200000 2616 1000000 2400 800000 2840 600000 1208171 cat /sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans 5477 cat /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table From : To : 1400000 1200000 1000000 800000 600000 1400000: 0 282 258 291 1549 1200000: 52 0 12 12 206 1000000: 48 0 0 14 208 800000: 52 0 0 0 265 600000: 2228 0 0 0 0

CPU frequency and voltage scaling code in the Linux(TM) kernel

CPU frequency and voltage scaling code in the Linux(TM) kernel
The CPUfreq governor "ondemand" sets the CPU depending on the current usage. To do this the CPU must have the capability to switch the frequency very quickly. There are a number of sysfs file accessible parameters: sampling_rate: measured in uS (10^-6 seconds), this is how often you want the kernel to look at the CPU usage and to make decisions on what to do about the frequency. Typically this is set to values of around ‘10000’ or more. It’s default value is (cmp. with users-guide.txt): transition_latency * 1000 Be aware that transition latency is in ns and sampling_rate is in us, so you get the same sysfs value by default. ignore_nice_load: this parameter takes a value of ‘0’ or ‘1’. When set to ‘0’ (its default), all processes are counted towards the ‘cpu utilisation’ value. When set to ‘1’, the processes that are run with a ‘nice’ value will not count (and thus be ignored) in the overall usage calculation. This is useful if you are running a CPU intensive calculation on your…

How does Linux Kernel know where to look for driver firmware?

How does Linux Kernel know where to look for driver firmware?
If you read the source, you’ll find that Ubuntu wrote a firmware_helper which is hard-coded to first look for /lib/modules/$(uname -r)/$FIRMWARE, then /lib/modules/$FIRMWARE, and no other locations. Translating it to sh, it does approximately this: echo -n 1 > /sys/$DEVPATH/loading cat /lib/firmware/$(uname -r)/$FIRMWARE > /sys/$DEVPATH/data || cat /lib/firmware/$FIRMWARE > /sys/$DEVPATH/data if [ $? = 0 ]; then echo -n 1 > /sys/$DEVPATH/loading echo -n -1 > /sys/$DEVPATH/loading fi which is exactly the format the kernel expects.


Where Do You Get Firmware? The firmware is usually maintained by the company that develops the hardware device. In Windows land, firmware is usually a part of the driver you install. It’s often not seen by the user. In Linux, firmware may be distributed from a number of sources. Some firmware comes from the Linux kernel sources. Others that have redistribution licenses come from upstream. Some firmware unfortunately do not have licenses allowing free redistribution. In Ubuntu, firmware comes from one of the following sources: The linux-image package (which contains the Linux kernel and licensed firmware) The linux-firmware package (which contains other licensed firmware) The linux-firmware-nonfree package in multiverse (which contains firmware that are missing redistribution licenses) A separate driver package Elsewhere (driver CD, email attachment, website) Note that the linux-firmware-nonfree package is not installed by default. The firmware files are placed into /lib/firmware….