VMs, VLANs And Bridges, Oh My! Part 2

Filed in Product & Development by Evan Callicoat | May 3, 2012 1:15 pm

Rackspace Cloud Integration Consultant Evan Callicoat has been busy deploying Rackspace Cloud Private Edition (RCPE) for OpenStack customers, and in this two-part blog series he shares insight into some of the issues he encountered with bridged networks in Linux and how to make routing and bridging play nice in a Linux environment.

In yesterday’s installment we set the table by digging deeper into routing, bridging and VLANs. Today, we’ll go into more detail of how those worlds come together.

Bridging VLANs

If we add the subif into a bridge, so that the other interface(s) in the bridge and/or the bridge itself can communicate on the VLAN attached to the mainif, all works as expected.

# brctl add br0
# brctl addif br0 eth0.100
# ip li s dev br0 up
# brctl show
bridge name bridge id         STP enabled interfaces
br0         8000.deadbeefd00d no          eth0.100

Thus far, we have a tagged subif inside of a bridge without IPs on either one. If we spin up a VM using our bridge, we’ll get a vif added to the bridge as well. At this point the VM can communicate with the network attached to eth0 in VLAN 100, with the eth0.100 subif providing the tagging/untagging as traffic leaves/enters the VM.

This works great if your native (untagged) traffic is never bridged to anything, and if your VM doesn’t do its own tagging. If you have a need for either of these things — at least one of which is very common in an enterprise production environment — then you’ll need to put eth0 into a bridge. The problem is, when you also have tagged subifs on eth0, putting eth0 into a bridge makes the subif(s) no longer untag the traffic correctly. Here’s how this works:

[1]

Looking at how traffic actually enters the networking stack in the bottom left of this diagram, you can see there’s first a “processing decision,” which decides whether to send the traffic to the bridging process or the routing process. This is a fairly straightforward step; if the ingress interface is in a bridge, bridge it. If it’s not, route it.

The problem is that VLAN tagging occurs in the routing process as a part of interface selection, so when the bridge receives tagged traffic on its eth0 port, the bridging process never sees the tag, because the code which detects it never gets run. Instead it only pays attention to the MAC addresses involved and makes a bridging decision based on them, which may or may not lead to routing, but not in such a way that the untagging code does its job, leaving us with tagged traffic hitting mainifs and subifs never seeing a single frame.

Fortunately, there’s an invaluable tool we can use to solve these two problems: Ethernet Bridge Tables (ebtables3)! Ebtables is to iptables as Layer 2 is to Layer 3. It has tables of chains in a similar fashion to iptables, with one in particular we’re interested in that is a bit of an odd duck: the Brouting table.

BROUTING

The Brouting table is the first table processed. It enables a unique behavior wherein frames which follow a -j ACCEPT target continue on to the rest of the bridging tables/chains, but following a -j DROP will actually kick the frame out of the bridging process and over to the routing process, as if it had never entered the bridge.

So let’s say that we need our VMs to communicate over the untagged VLAN, so eth0 is in our bridge with the vifs. In addition, we have eth0.100 unbridged so we can hit VLAN 100 from the host. At this point, outgoing traffic through eth0.100 works, but no traffic is received. By making use of some match criteria specific to our setup, we can write a rule in ebtables that fixes the issue, like so:

# ebtables -t broute -A BROUTING -i eth0 -p 802_1Q -j DROP

This command adds a rule to the BROUTING chain in the broute table where any frames entering eth0 with a protocol of 8021q (VLAN tagging) gets kicked out of bridging and goes straight to routing. The result is that the VLAN code detects the tag and selects the correct subif for the traffic. We don’t have to adjust outgoing traffic because in this setup it will be originating from or through the subif and thus get tagged before the bridge sees it.

Things get a little trickier when you want to have this setup and also handle traffic being tagged inside of VMs. By solving the first problem, we’ve created a new one for ourselves; if eth0 -j DROPs all tagged traffic to kick it to the host’s subifs, VMs will never see it. To get that traffic back through, we need to use more ebtables magic by modifying the rule slightly:

# ebtables -t broute -F
# ebtables -t broute -A BROUTING -i eth0 -p 802_1Q -d de:ad:be:ef:do:od -j DROP

What we’ve done here is stipulate that only if the tagged traffic entering eth0 is destined for the mainif (or any of its subifs) MAC address do we want the bridge to kick it back to get untagged on the host. Otherwise, bridging proceeds and the VM receives the tagged traffic as-is. This works because the MAC address of a subif is copied from the mainif it is attached to, allowing us to distinguish between traffic for the host and traffic for VMs.

Conclusion

This may seem like an overly convoluted approach to networking when there are other possible solutions such as just throwing in more interfaces and eating up port density on your switches and aggrs. However, in a modern cloud environment where you need to solve the problems of density, multi-tenancy, VM/host isolation, integrating fine-grained security with logical network segregation, etc… being able to flexibly massage the Linux VLAN and bridge code into handling any combination of ports and tags is truly essential.

Hopefully this long-winded post helps anyone struggling with making these two indispensable network building-blocks play nice in Linuxland at least understand how the pieces fit together better. Feel free to leave feedback on your experiences, solutions or harsh language when you’re forced to resort to ebtables. Better the devil you know.


3 In most distros you have to install an ebtables package to obtain the binary, but what it talks to is built into most every stock kernel; similar to 8021q, but isn’t its own module.

Endnotes:
  1. [Image]: http://c3414940.r40.cf0.rackcdn.com/blog/wp-content/uploads/2012/04/Linux-Packet-Flow.png

Source URL: http://www.rackspace.com/blog/vms-vlans-and-bridges-oh-my-part-2/