SR-IPv6 - Linux Kernel implementation

Segment Routing





edit SideBar


Filed in: Implementation.Issues · Modified on : Sat, 12 Mar 16

There is a number of issues that are either inherent to the encapsulation model, or to the specific Linux network setup, that I do not address in the implementation as it is out of scope. These issues are all related to offloading. In this page, I explain them and propose a workaround for each of them.

TCP packet drop at final host with inline mode

When, in inline mode, a packet reaches the last node of the path (i.e., the final destination) but there are still some local segments to process, and GRO is enabled, the packet might get dropped.

Indeed, when there are packets available for the network driver to send up the layers, and GRO is enabled, the driver will see if it can aggregate contiguous, similar packets into a larger one to reduce the number of packets to be sent up. Each layer may implement a gro_receive function that will attempt to aggregate several packets into one. When TCP's gro_receive function is called, it verifies the checksum of the packets before merging. If the checksum fails, the packet is not dropped, but its attribute csum_bad is set to 1 and the packet is immediately sent up the layers. When the TCP layer processes a packet with csum_bad = 1, then the packet is immediately dropped and the checksum is not actually recomputed.

This is what happens here. The TCP checksum is computed over the TCP header and data, and the IPv6 pseudo-header. The IPv6 pseudo-header is composed of the source and destination address, and the protocol. When the checksum was computed at the source, the destination address was set as the final destination. However, when the packet entered the Linux host, the destination address was not the final one because there were still local segments to process. Thus, TCP's gro_receive function computed its own checksum using the wrong destination address and as the checksum did not match, the packet was marked as having an incorrect checksum, and was dropped later on by the TCP stack.

This problem does not happen with the encap mode because the IPv6 pseudo-header is taken from the inner IPv6 packet, which does not change.

Solutions: use encap mode or disable GRO on last host (ethtool -K iface0 gro off)

Bad performances on SR routers for TCP traffic in inline mode

Directly related to the previous issue, you may get bad performances on intermediate routers when in inline mode. This caused by the exact same mechanism: TCP's gro_receive function computes the incorrect checksum and the packet is flushed. However, we are not on the final host, the packet does not enter the TCP stack and does not get dropped. However, all the checksum mismatches impact the performances.

Solutions: use encap mode or disable GRO on all intermediate hosts (ethtool -K iface0 gro off)

ixgbe driver complaining in inline mode

ixgbe 0000:0b:00.1: partial checksum but l4 proto=2b!
ixgbe 0000:0b:00.1: partial checksum but l4 proto=2b!
ixgbe 0000:0b:00.1: partial checksum but l4 proto=2b!

If you get such messages, it means that tx checksum offloading is enabled but the driver does not recognize the layer 4 protocol. The value 0x2b is equal to 43, which is the routing header. The problem is that the ixgbe driver does not follow the IPv6 header chain and simply looks at the immediate next header, which is the routing header, and the driver does not know what to do with it.

This problem does not happen with the encap mode because the layer 4 protocol is inferred from the inner headers.

Solutions: use encap mode or disable TX offloading on host (ethtool -K iface0 tx off)

TCP packet drop at final host with Virtual Ethernet Pairs

If you are in a fully virtualized network setup using network namespaces and veth pairs, and you are running a service segment in the middle of the path, then your packets might get dropped by the final destination. The problem is the following. When a TCP packet is generated, and TX offloading is enabled, the checksum is only computed on the IPv6 pseudo-header, and the rest of the checksum (i.e., TCP header and data) is left for the NIC. However, with veth pairs, the packet actually never gets to see a physical NIC and the TCP's checksum field is never set to the correct value. This is not a problem when the skb is not modified between the source and the destination, because the packet is marked as valid in the skb metadata. However, when the packet goes through a service segment, the skb is recreated from scratch, destroying the metadata in the process. The destination will then see an invalid checksum field and drop the packet.

This problem happens with encap and inline modes.

Solutions: compute the TCP checksum in the service before sending the packet back to the kernel, or disable TX offloading on the source (ethtool -K iface0 tx off)

Powered by PmWiki