Modify UDP packet using eBPF

涛叔 2022-11-19 ⏳2.8 min(1.1k words) 🕸️

eBPF is one of the most important infrastructures for cloud computing. It’s widely used for networking observation and filtration. Under certain circumstances, we need to modify packets directly. In this article, I will show you how to finish this task.

eBPF program is written in C language and compiled by clang. Unlike the normal C code, there is no need to define the main() function. Because every eBPF function should be assigned manually.

Suppose we already have the following C codes:

int modudp(struct __sk_buff *skb)
{
  // ...
}

How could we let Linux kernel run the modudp() for some UDP packet? We should used first the __attribute__ directive to set the name of section.

__attribute__((section(modudp), used))
int modudp(struct __sk_buff *skb)
{
  // ...
}

Being widely used, it will be great convenient to define a macro:

#define SEC(NAME) __attribute__((section(NAME), used))

SEC(modudp)
int main_modudp(struct __sk_buff *skb)

And then, we need to compile the C code, using clang.

clang -O2 -target bpf -I/usr/include/x86_64-linux-gnu -c bpf.c -o bpf.o

I assume we are using the x86_64 architecture. And clang will generate the bpf.o file.

Finally, we need to load the bpf.o into the kernel. And this is a slight complicated task.

In order to load the bpf.o, we need to use the tc command and create a new qdisc with type clsact. Many articles on the Internet say you need create an ingress qdisc. However, as the name indicates, the ingress qdisc can only process the ingress packets. If you need to process both ingress and egress packets, you should create the clsact qdisc.

Creating qdisc is very simple:

tc qdisc add dev eth0 clsact

The eth0 is the NIC you want to attach. After doing it, you can let kernel display the qdisc created:

tc qdisc show dev eth0
qdisc mq 0: root
qdisc clsact ffff: parent ffff:fff1

Next, we need to associate the eBPF function to qdisc.

tc filter add dev eth0 ingress bpf da obj bpf.o sec i_modudp
tc filter add dev eth0  egress bpf da obj bpf.o sec e_modudp

There are some key points should be noted. First, the sub-command of tc is filter. When we add filter, we need to indicate the direction by the option of ingress/egress. And the bpf indicates we would like to load some eBPF object to do the filter task. And finally, yet most importantly, we use sec to choose the right function to run.

You may wondering what’s the meaning of the da option after bfp. Yes, it is another very import option, as well.

In the early usage of eBPF, we need to define to function for filtering, one is called classifier, and the other action. The kernel run the classifier first, and only after it returns -1 can the kernel run the corresponding action task. This design is of relatively poor performance. So the kernel changed, and allow the classifier to do the action task in place. This is why we need to set the da option, which means direct-action. We should always set the da option in now days.

OK, we have learned how to load eBPF object. And it’s time to show the real code to modify UDP packets.

In my example, I will change the first byte of payload for some UDP packet to the character _. But why UDP? Because it is simple, and it is easy to demonstrate the usage. You can change packages of any protocol.

Firstly, we need some header files:

#include <stddef.h>
#include <linux/bpf.h>
#include <linux/pkt_cls.h>
#include <linux/if_ether.h>
#include <linux/in.h>
#include <linux/ip.h>
#include <linux/udp.h>

These files contain struct about eBPF and IP protocol. I’ll explain in detail later.

Then we need two helper macro:

#define SEC(NAME) __attribute__((section(NAME), used))

#define trace_printk(fmt, ...) do { \
   char _fmt[] = fmt; \
   bpf_trace_printk(_fmt, sizeof(_fmt), ##__VA_ARGS__); \
   } while (0)

The SEC has been explained before. The trace_printk is another helper like the printf of stdio, which will be used to display debugging information.

And now, let’s do some coding. Before running our eBPF function, the kernel will parse the packet from/to the NIC. And all related information will be filled into a __sk_buff struct.

Before doing any modification, we need to check some certain to assert the current packet is the packet we want to change.

As we want to alter some UDP packet, we skip all packets which are not of UDP protocol.

/* We will access all data through pointers to structs */
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;

// first we check that the packet has enough data,
// so we can access the three different headers of ethernet, ip and udp
if (data + sizeof(struct ethhdr) + sizeof(struct iphdr) + sizeof(struct udphdr) > data_end)
  return TC_ACT_UNSPEC;

And then, we can define structs to access packet information:

/* for easy access we re-use the Kernel's struct definitions */
struct ethhdr  *eth  = data;
struct iphdr   *ip   = (data + sizeof(struct ethhdr));
struct udphdr  *udp  = (data + sizeof(struct ethhdr) + sizeof(struct iphdr));

We should ignore all none-UDP packets:

/* Only actual IP packets are allowed */
if (eth->h_proto != __constant_htons(ETH_P_IP))
  return TC_ACT_OK;

/* We handle only UDP traffic */
if (ip->protocol != IPPROTO_UDP)
  return TC_ACT_OK;

And finally, we got a UDP packet. Then we still need to check the UDP port. Because we do not want to break all other UDP traffics.

/* We need convert the host order to network order */
__be16 port = __constant_htons(2048);
if (udp->dest != port && udp->source != port) {
  return TC_ACT_OK;
}

The port occupies two bytes, so we need to covert the port 2048 to it’s network order, aka, big endian. And we only change packets whose source or destination port is 2048.

Before we really change the payload, we have to check it’s length.

__be16 size = __constant_htons(1);
if (udp->len < size) {
  return TC_ACT_OK;
}

If all check passed, we can modify the payload safely. The payload data starts at the pointer udp + sizeof(struct udphdr). But we could not change it directly. The eBPF system offers bpf_skb_store_bytes() function, which will help us to do this task.

char c = '_';
int off = sizeof(struct ethhdr) + sizeof(struct iphdr) + sizeof(struct udphdr);
bpf_skb_store_bytes(skb, off, &c, sizeof(c), BPF_F_RECOMPUTE_CSUM);

bpf_skb_store_bytes accepts five arguments:

The first is the bkb pointer.
The second is the offset of the whole network packet we intend to change. This is why we calculate the off by adding all size of ethhdr, iphdr and udphdr.
The third is the data size to change. In our case, we need to change only one byte.
The last is some additional flags. Because UDP will calculate content check-sum, we should let kernel recalculate this value after changing. Otherwise, the TCP/IP protocol stack will drop the mismatched packet.

Notes: If your system has the include/uapi/linux/bpf.h file, you should include it. But my system does not have it, so I have to add my own declaration:

static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
	(void *) BPF_FUNC_trace_printk;
static int (*bpf_skb_store_bytes)(void *ctx, int off, void *from, int len, int flags) =
        (void *) BPF_FUNC_skb_store_bytes;

Last of the last, we need return the TC_ACT_UNSPEC and declare the copyright:

char __license[] SEC("license") = "GPL";

Only code with the GPL license can be loaded by the kernel.

OK, this is the whole program. You can download it from here.

Let’s test our code.

First load them into kernel:

tc filter add dev eth0 ingress bpf da obj bpf.o sec modudp
tc filter add dev eth0 egress bpf da obj bpf.o sec modudp

Then listen port 2048 of UDP using nc:

nc -u -l -p 2048

And last, connect to the server from another device by nc:

nc -u host-addr 2048

If we send the following lines from the client side:

abc
123
+++

The servier will display:

_bc
_23
_++

No matter client or server side send any thing, the other side will receive the some data with first char changed to the _!

If you have problem, please run tc exec bpf dbg to see the debugging trace information.

After testing, we need delete all the eBPF object:

tc filter del dev eth0 ingress
tc filter del dev eth0 egress
tc qdisc del dev eth0 clsact

That is all the content of this article. Feel free to comment 😄

References: