Cloudflare architecture and how BPF eats the world

Cloudflare architecture and how BPF eats the world

Cloudflare architecture and how BPF eats the world
Marek Majkowski 2019-05-18
Recently at Netdev 0x13, the Conference on Linux Networking in Prague, I gave a short talk titled “Linux at Cloudflare”. The talk ended up being mostly about BPF. It seems, no matter the question – BPF is the answer.

Here is a transcript of a slightly adjusted version of that talk.

At Cloudflare we run Linux on our servers. We operate two categories of data centers: large “Core” data centers, processing logs, analyzing attacks, computing analytics, and the “Edge” server fleet, delivering customer content from 180 locations across the world.

In this talk, we will focus on the “Edge” servers. It’s here where we use the newest Linux features, optimize for performance and care deeply about DoS resilience.

Our edge service is special due to our network configuration – we are extensively using anycast routing. Anycast means that the same set of IP addresses are announced by all our data centers.

This design has great advantages. First, it guarantees the optimal speed for end users. No matter where you are located, you will always reach the closest data center. Then, anycast helps us to spread out DoS traffic. During attacks each of the locations receives a small fraction of the total traffic, making it easier to ingest and filter out unwanted traffic.

Anycast allows us to keep the networking setup uniform across all edge data centers. We applied the same design inside our data centers – our software stack is uniform across the edge servers. All software pieces are running on all the servers.

In principle, every machine can handle every task – and we run many diverse and demanding tasks. We have a full HTTP stack, the magical Cloudflare Workers, two sets of DNS servers – authoritative and resolver, and many other publicly facing applications like Spectrum and Warp.

Even though every server has all the software running, requests typically cross many machines on their journey through the stack. For example, an HTTP request might be handled by a different machine during each of the 5 stages of the processing.

Let me walk you through the early stages of inbound packet processing:

(1) First, the packets hit our router. The router does ECMP, and forwards packets onto our Linux servers. We use ECMP to spread each target IP across many, at least 16, machines. This is used as a rudimentary load balancing technique.

(2) On the servers we ingest packets with XDP eBPF. In XDP we perform two stages. First, we run volumetric DoS mitigations, dropping packets belonging to very large layer 3 attacks.

(3) Then, still in XDP, we perform layer 4 load balancing. All the non-attack packets are redirected across the machines. This is used to work around the ECMP problems, gives us fine-granularity load balancing and allows us to gracefully take servers out of service.

(4) Following the redirection the packets reach a designated machine. At this point they are ingested by the normal Linux networking stack, go through the usual iptables firewall, and are dispatched to an appropriate network socket.

(5) Finally packets are received by an application. For example HTTP connections are handled by a “protocol” server, responsible for performing TLS encryption and processing HTTP, HTTP/2 and QUIC protocols.

It’s in these early phases of request processing where we use the coolest new Linux features. We can group useful modern functionalities into three categories:

DoS handling
Load balancing
Socket dispatch
Let’s discuss DoS handling in more detail. As mentioned earlier, the first step after ECMP routing is Linux’s XDP stack where, among other things, we run DoS mitigations.

Historically our mitigations for volumetric attacks were expressed in classic BPF and iptables-style grammar. Recently we adapted them to execute in the XDP eBPF context, which turned out to be surprisingly hard. Read on about our adventures:

L4Drop: XDP DDoS Mitigations
xdpcap: XDP Packet Capture
XDP based DoS mitigation talk by Arthur Fabre
XDP in practice: integrating XDP into our DDoS mitigation pipeline (PDF)
During this project we encountered a number of eBPF/XDP limitations. One of them was the lack of concurrency primitives. It was very hard to implement things like race-free token buckets. Later we found that Facebook engineer Julia Kartseva had the same issues. In February this problem has been addressed with the introduction of bpf_spin_lock helper.

While our modern volumetric DoS defenses are done in XDP layer, we still rely on iptables for application layer 7 mitigations. Here, a higher level firewall’s features are useful: connlimit, hashlimits and ipsets. We also use the xt_bpf iptables module to run cBPF in iptables to match on packet payloads. We talked about this in the past:

Lessons from defending the indefensible (PPT)
Introducing the BPF tools
After XDP and iptables, we have one final kernel side DoS defense layer.

Consider a situation when our UDP mitigations fail. In such case we might be left with a flood of packets hitting our application UDP socket. This might overflow the socket causing packet loss. This is problematic – both good and bad packets will be dropped indiscriminately. For applications like DNS it’s catastrophic. In the past to reduce the harm, we ran one UDP socket per IP address. An unmitigated flood was bad, but at least it didn’t affect the traffic to other server IP addresses.

Nowadays that architecture is no longer suitable. We are running more than 30,000 DNS IP’s and running that number of UDP sockets is not optimal. Our modern solution is to run a single UDP socket with a complex eBPF socket filter on it – using the SO_ATTACH_BPF socket option. We talked about running eBPF on network sockets in past blog posts:

eBPF, Sockets, Hop Distance and manually writing eBPF assembly
SOCKMAP – TCP splicing of the future
The mentioned eBPF rate limits the packets. It keeps the state – packet counts – in an eBPF map. We can be sure that a single flooded IP won’t affect other traffic. This works well, though during work on this project we found a rather worrying bug in the eBPF verifier:

eBPF can’t count?!
I guess running eBPF on a UDP socket is not a common thing to do.

Apart from the DoS, in XDP we also run a layer 4 load balancer layer. This is a new project, and we haven’t talked much about it yet. Without getting into many details: in certain situations we need to perform a socket lookup from XDP.

The problem is relatively simple – our code needs to look up the “socket” kernel structure for a 5-tuple extracted from a packet. This is generally easy – there is a bpf_sk_lookup helper available for this. Unsurprisingly, there were some complications. One problem was the inability to verify if a received ACK packet was a valid part of a three-way handshake when SYN-cookies are enabled. My colleague Lorenz Bauer is working on adding support for this corner case.

After DoS and the load balancing layers, the packets are passed onto the usual Linux TCP / UDP stack. Here we do a socket dispatch – for example packets going to port 53 are passed onto a socket belonging to our DNS server.

We do our best to use vanilla Linux features, but things get complex when you use thousands of IP addresses on the servers.

Convincing Linux to route packets correctly is relatively easy with the “AnyIP” trick. Ensuring packets are dispatched to the right application is another matter. Unfortunately, standard Linux socket dispatch logic is not flexible enough for our needs. For popular ports like TCP/80 we want to share the port between multiple applications, each handling it on a different IP range. Linux doesn’t support this out of the box. You can call bind() either on a specific IP address or all IP’s (with 0.0.0.0).

In order to fix this, we developed a custom kernel patch which adds a SO_BINDTOPREFIX socket option. As the name suggests – it allows us to call bind() on a selected IP prefix. This solves the problem of multiple applications sharing popular ports like 53 or 80.

Then we run into another problem. For our Spectrum product we need to listen on all 65535 ports. Running so many listen sockets is not a good idea (see our old war story blog), so we had to find another way. After some experiments we learned to utilize an obscure iptables module – TPROXY – for this purpose. Read about it here:

Abusing Linux’s firewall: the hack that allowed us to build Spectrum
This setup is working, but we don’t like the extra firewall rules. We are working on solving this problem correctly – actually extending the socket dispatch logic. You guessed it – we want to extend socket dispatch logic by utilizing eBPF. Expect some patches from us.

Then there is a way to use eBPF to improve applications. Recently we got excited about doing TCP splicing with SOCKMAP:

SOCKMAP – TCP splicing of the future
This technique has a great potential for improving tail latency across many pieces of our software stack. The current SOCKMAP implementation is not quite ready for prime time yet, but the potential is vast.

Similarly, the new TCP-BPF aka BPF_SOCK_OPS hooks provide a great way of inspecting performance parameters of TCP flows. This functionality is super useful for our performance team.

Some Linux features didn’t age well and we need to work around them. For example, we are hitting limitations of networking metrics. Don’t get me wrong – the networking metrics are awesome, but sadly they are not granular enough. Things like TcpExtListenDrops and TcpExtListenOverflows are reported as global counters, while we need to know it on a per-application basis.

Our solution is to use eBPF probes to extract the numbers directly from the kernel. My colleague Ivan Babrou wrote a Prometheus metrics exporter called “ebpf_exporter” to facilitate this. Read on:

Introducing ebpf_exporter
https://github.com/cloudflare/ebpf_exporter
With “ebpf_exporter” we can generate all manner of detailed metrics. It is very powerful and saved us on many occasions.

In this talk we discussed 6 layers of BPFs running on our edge servers:

Volumetric DoS mitigations are running on XDP eBPF
Iptables xt_bpf cBPF for application-layer attacks
SO_ATTACH_BPF for rate limits on UDP sockets
Load balancer, running on XDP
eBPFs running application helpers like SOCKMAP for TCP socket splicing, and TCP-BPF for TCP measurements
“ebpf_exporter” for granular metrics
And we’re just getting started! Soon we will be doing more with eBPF based socket dispatch, eBPF running on Linux TC (Traffic Control) layer and more integration with cgroup eBPF hooks. Then, our SRE team is maintaining ever-growing list of BCC scripts useful for debugging.

It feels like Linux stopped developing new API’s and all the new features are implemented as eBPF hooks and helpers. This is fine and it has strong advantages. It’s easier and safer to upgrade eBPF program than having to recompile a kernel module. Some things like TCP-BPF, exposing high-volume performance tracing data, would probably be impossible without eBPF.

Some say “software is eating the world”, I would say that: “BPF is eating the software”.

All content © 2019 Cloudflare

The Smallest Ryzen Yet: Asrock DeskMini A300 Review

The Smallest Ryzen Yet: Asrock DeskMini A300 Review

TechSpot
TRENDING

FEATURES

REVIEWS

THE BEST

DOWNLOADS

PRODUCTS
REVIEWS

DESKTOP PCS
The Smallest Ryzen Yet: Asrock DeskMini A300 Review
By Steven Walton on March 27, 2019 90
$150 Barebones Mini PC
The Asrock DeskMini A300 is a tiny PC that takes advantage of Ryzen processors. Almost every custom designed mini PC that we’ve seen to date has used Intel inside and while Intel CPUs are very good, they aren’t the best choice for this kind of system. At least if you want to game or do any kind of 3D work, for that AMD’s Ryzen APUs are unrivaled.

So after dozens of Intel-based Beeboxes and DeskMini PCs, Asrock has finally developed an AM4 socket system for Raven Ridge (and Bristol Ridge APUs). Ideally you want to throw in the Ryzen 3 2200G or Ryzen 5 2400G inside this system, our current top budget CPU picks that also happen to have more than decent integrated graphics capabilities.

With a recent price drop, for $135 the Ryzen 5 2400G is a cracking good processor and only those not in the know would buy the Core i3-8100 for $150 instead. We’ll revisit the 2400G eventually at this price point as it brings SMT support, so twice as many threads than the 2200G for an extra $40. For now, we’ll see how it performs in the new DeskMini A300, as this is the best APU you can pair with this compact 1.9L barebone.

The DeskMini A300 costs a very reasonable $150 and for that investment you get a custom case that measures 155mm wide, 155mm deep and 80mm tall. It’s capable of housing two 2.5” storage devices and up to a 46mm tall CPU cooler. Inside you’ll find Asrock’s A300M-STX motherboard which measures just 140 x 147mm.

As you might expect for such a tiny motherboard it’s not exactly brimming with features but you do get all the essentials. Front I/O includes a USB 3.1 Gen1 Type-C port along with a Type-A and around the back there’s an additional two USB ports, a 3.1 Gen1 Type-A and a 2.0 Type-A. There’s a basic Realtek audio and Gigabit network solution, three M.2 ports, two for an SSDs and a third for a Wi-Fi module and then two laptop-style memory DIMMs supporting up to DDR4-2933 with Ryzen APUs.

The display outputs include a single HDMI, DisplayPort and a legacy VGA port, and it’s possible to use all three simultaneously for triple monitor configurations.

There are two optional items, one is a very basic air-cooler which came with our review unit and the other is an M.2 Wi-Fi kit, this didn’t come with our sample. There is a listing over on Newegg which doesn’t include the cooler but does come with the Intel AC-3168 Wi-Fi kit for $150, so that’s a great option. After all, you can use the Wraith Stealth cooler that comes with your Ryzen APU, though be aware you’ll have to remove the fan shroud, just the top cover with the AMD logo, this reduces the cooler’s height by a few millimeters and has no impact on the cooling performance.

Powering the DeskMini A300 is an included 120w power brick, it’s a 19v, 6.32A version and that’s plenty of headroom for using something like the Ryzen 5 2400G.

That’s the barebone. What you need to bring to the table is a CPU, we recommend either the 2200G or 2400G and then some kind of storage. I recommend something cheap such as the WD Blue 500GB which costs just $68 or if you can find the Plextor S2G 512GB at $55, either would work well for this build. You can fill the two 2.5” drive bays with larger mechanical hard drives or additional SSDs, your choice there.

We’ll discuss more hardware configurations towards the end of the review, for now let’s see what kind of gaming performance the DeskMini A300 paired with the Ryzen 5 2400G has to offer. We won’t be delving into application performance as nothing has changed since our review of this CPU. Gaming performance, on the other hand, has seen improvements through updated drivers.

For testing we used G.Skill’s Ripjaws DDR4-2133 CL15 16GB for a simple reason: it’s cheap at just $80. G.Skill’s DDR4-3200 memory costs upwards of $130 and while it will offer a nice performance gain when using the Vega 11 GPU, we can overclock the DDR4-2133 memory, and that’s exactly what I did.

Setting the frequency to 2933 MHz reduced the timings to CL16-21-21-21-49. You could no doubt manually tune those timings for even better performance, but I wanted to test something that was closer to the out of the box experience.

Gaming Performance Impressions
Apex Legends
First up we have Apex Legends and here we were forced down to 720p with the lowest possible quality settings. At times the performance was deceiving as frame rates went above 100 fps but others they would dip to half that. Still overall we were looking at around 60-70 fps and performance was consistent for the most part. Bumping the resolution up to 900p was playable but the frame dips were far more noticable.

Battlefield V
Moving on we have Battlefield V and again we were forced down to 720p in search of 60 fps, and this kind of frame rate is required if you hope to rack up a few kills in the multiplayer modes. With the low quality preset enabled we typically saw around 60 fps with dips into the 50s, but overall a smooth and enjoyable experience, certainly playable by entry level PC standards.

CS:GO
When testing potato-like gaming performance you’ve always gotta include CS:GO and for this one we were able to test at 1080p using high quality visuals, though that’s not saying much. Frame rates dipped into the 40s but were often up around 60 fps, needless to say with a few quality tweaks maintaining over 60 fps won’t be an issue.

Far Cry New Dawn
Using the ‘normal’ quality preset at 720p frame rates fluctuated between 35 and 45 fps. There we’re at times quite serious frame stutters but the game was still playable and I guess you’ve sort of got to expect that kind of thing when gaming with integrated graphics.

Fortnite
Next up we have Fortnite and for this one we went with the low quality settings at 900p and that allowed for at least 75 fps though most of the time we were looking more at 90-100 fps. Sadly though the experience was spoiled somewhat by fairly regular frame stuttering.

Rainbow Six Siege
One of the best experiences we had was with Rainbow Six Siege, using the lowest quality preset the resolution was set to 900p, though with TAA a 50% render scale is set. The game looked very good and played buttery smooth, I couldn’t blame any missed short on frame stutters this time. Frame rates stayed above 60 fps at all times and generally hovered around 80fps, again it was a great experience and a perfect example of the fun that can be had with this compact system.

Rocket League
We fired up Rocket League and enabled the ‘performance’ preset at 1080p and this generally saw frame rates hovering between 70-80 fps allowing for smooth highly playable performance. As we saw with CS:GO it’s possible to play these less demanding, but still highly popular titles, at very respectable quality settings.

Power & Temps
As mentioned earlier the DeskMini A300 comes with a 120w power supply which is fine for use with the Ryzen 5 2400G provided you don’t go crazy with overclocking. Actually, you won’t be able to overclock anyway as you’ll be limited by thermals.

In its stock configuration using the 2400G with a pair of 1TB mechanical HDDs and a 2TB SSD, the DeskMini A300 consumed 82 watts in our Blender stress test and 102 watts when playing Battlefield V multiplayer. So there really isn’t much headroom left for overclocking.

Looking at operating temperatures using the optional cooler the Ryzen 5 2400G hit 78 degrees under load in our Blender stress test, but idled at just 33 degrees. This is a reasonable operating temperature and surprisingly this little cooler didn’t make much noise. Replacing the base cooler with the Wraith Stealth only dropped the load temperature by 4 degrees, though the DeskMini A300 was basically silent now. So if you have a Stealth cooler and you’re happy to take off the top cover then I recommend using it in the A300.

Closing Remarks
Asrock’s new DeskMini A300 is everything we hoped it would be and we’re glad they finally released a mini PC that supports Raven Ridge APUs. The gaming performance won’t blow your socks off, but you can at least play in some capacity, this simply wasn’t possible beyond flash-based games on previously seen Intel models.

If you’re after a compact gaming rig and you don’t want to buy a console for gaming, a fully functioning PC like the DeskMini A300 can fit the bill nicely. We do recognize this is a very niche product, you have to be in the market for an extremely small PC that can handle some light 3D work.

If you’re simply after a cost-effective gaming rig then this isn’t it. Even though the barebones is priced modestly, the A300’s inability to support a discrete graphics card makes your only upgrade option here a Zen 2-based APU.

Budget Build Options

Alternatively you could build your own Ryzen 5 2400G system in a larger MicroATX case for a few dollars more and this gives you the ability to snap up something like a Radeon RX 560 or perhaps even one of those insanely cheap RX 570 models and this $100-150 discrete graphics card upgrade will improve gaming performance tenfold.

Or for a slight price premium it’s also possible to build a Mini-ITX system, again with the ability to support discrete graphics cards. But in the end, if you seek a super compact Mini PC then we feel there is no better option right now than the DeskMini A300.

Shopping Shortcuts:
Asrock DeskMini A300 on Newegg
AMD Ryzen 5 2400G on Amazon, Newegg
AMD Ryzen 3 2200G on Amazon, Newegg
22 comments
899 interactions
Back to TechSpot

Follow TechSpot
Facebook

Twitter

Instagram

YouTube

RSS
© 2019 TechSpot, Inc. All Rights Reserved.

TechSpot is a registered trademark. Terms Privacy

Sucuri: Free website malware and security scanner

Sucuri: Free website malware and security scanner

Enter a URL (ex. sucuri.net) and the Sucuri SiteCheck scanner will check the website for known malware, blacklisting status, website errors, and out-of-date software.

Disclaimer: Sucuri SiteCheck is a free website security scanner.
Remote scanners have limited access and results are not guaranteed. For a full scan, contact our team.

Keep your site clean, fast, and protected

Website Monitoring

Malware removal and hack repair (response)

Remove Malware
Website Firewall

Protect and speed up
your site

Protect Your Site
Website Backups

Backup your website and associated files

Backup Your Site

Freeshells.org: FAST, SECURE, AND RELIABLE HOSTING

Freeshells.org: FAST, SECURE, AND RELIABLE HOSTING

Account Registration
We require that all accounts be filled with valid information. Improper/invalid account information may lead to delays in order processing, and any account created with blatantly false or incomplete information will be marked as fraud.

All new accounts are manually reviewed upon creation. Accounts with false information will be marked as fraud and the client will be prohibited from signing up for services in the future. There are NO EXCEPTIONS.

First & Last Name

Clients must provide their full legal First and Last (family/surname) name. Registering an account under any name other than your own is prohibited, and the account will be marked as fraud.

Email Address

Clients must provide a valid email address. Addresses made for the specific purpose of signing up for the account will not be accepted.

Address1 / Address2, City, State/Region, Postal Code

All clients must enter a valid residential address, city, state/region, and postal code. If you are not sure how to enter it in english you may enter it in your native language.

Phone Number

A phone number is optional.

IRC Activation
IRC Activation is required for FREE Accounts only.

To Activate via IRC:

Sign up for a free shell account
Check your email for an email with a subject of ‘order confirmation’
Read the email
Find your ORDER ID (NOT your order number)
ORDER ID is 5 numbers
Join IRC. We are located at irc.criten.net in #freeshells.org – You can use mIRC, mibbit, xchat, etc to connect. You do NOT use a web browser to connect to the chat room unless you are using mibbit or kiwiIRC
When you join our chatroom, type !accept IE: !accept 10400
You will receive a message from super stating your account was created, or that it failed, and you will then receive an email with your username and password.
NOTE: Orders not activated within 2 days will be cancelled and you will need to start over.