My requirements in a router are a bit unique. I run most of my services out of a single rack in a single location. At the same time, some services are spread out around the world on edge nodes. I also strive to maintain a flexible and expandable network topology that allows me to make the most efficient routing decision with the the ability quickly reroute traffic if needed.
A few years ago, microservices and containerization became very popular, allowing developers to segment their applications into logical boundaries whether for security, simplicity, or to scale.
Network architecture can also follow a similar model.
In a traditional sense we have the separation of the control plane and data plane. It's common to see vendors like Arista and Juniper using dedicated switching ASICs for the data plane paired with a normal CPU running Linux for the control plane. There are some exceptions to this however, with certain models supporting hardware offloading for certain protocols more traditionally thought of as living in the control plane.
This is cool and all, and I always enjoy reading about the huge packet throughput and bandwidth that these boxes have to offer. But there really aren't enough options out there for a router that can handle millions of FIB routes at line rate of 10G or higher. Admittedly this is a specific market, one popular use case being route reflectors that aggregate many transit providers and customers.
For me, this has more to do with flexibility then anything else. I tinker, upgrade, and rebuild my network frequently so I like to be able to have a routing platform that supports this type of workflow. At the same time, I can't have the network go down when I'm working on it, since I host tings like DNS and Email that would cause issues if the servers were down for even a short time.
All in all, my list of router requirements is as follows:
- Ability to handle >3M total routes
- 10GbE with the possibility of 40GbE down the line
- Line rate and non-blocking
- Minimal annoyances; things should work they way one would hope/expect. (Not picky about optic EEPROMs, both live and file-based configuration, filters should fail closed instead of causing a route leak)
- Easy and integrated automation. (Running commands with an SSH client library and parsing the output with regex doesn't count!)
- No vendor lock in
- Reasonable pricing
- Open source would be nice, but not strictly required
I'm a developer, I live in software. So logically the first thing I did was look into software routing solutions. I already use Debian and BIRD for a few BGP edge routers and it works really well. I have a router with over 2 million routes in the kernel tables that even while doing RPKI validation can route 10G in real world speedtests with reasonable reconvergence times. Great, right? But the issues start to appear once you start adding certain rulesets. BIRD drops invalid routes before they reach the kernel to forward, so even with a huge number of routes being validated (millions) the impact is on a "per route imported" basis and not per packet. On the other hand, ACLs have to be evaluated on every incoming packet. This can put enormous strain on the CPU depending on traffic volume. The control plane and data plane have essentially lost any distinction because with software routing they are effectively combined into one slow process.
One solution to this is XDP, or eXpress Data Path. XDP is a hook for the Berkeley Packet Filter (eBPF) which allows expressive packet processing code to be loaded into either the driver, kernel, or most importantly the NIC itself. eBPF XDP programs are written in C, and as such offer the flexibility of using a full programming language for filter expression. This also comes with the issue of letting your sysadmins do manual memory management. Don't get me wrong, XDP programs are incredibly powerful and certainly have their place. I've documented some of my experiments here: https://github.com/natesales/racl. But the really impressive part is a little program called bpfilter. Available in the mainline kernel, it hooks into the underlying kernel level iptables handler and replaces the kernel forwarding rules with BPF code that can run on the NIC itself. Currently a very small number of NICs support xdpoffload, but those that do allow for true line rate routing with a Linux control plane. The end result of bpfilter running on a supported NIC is that iptables filters are evaluated on the NIC and never even cross the PCIe boundary. All in all, this leads to a very flexible (and therefore scalable) routing platform that checks all my boxes. This is still a huge work in progress, so stay tuned for future updates on the blog about the policy side of the L2 mesh that these routers are contained in.