Difference between revisions of "Advanced traffic control"

From ArchWiki
Jump to: navigation, search
m (orthography)
m (fix use incorrect use of word 'packages' with packets and fixed grammar in a couple of places)
 
(53 intermediate revisions by 16 users not shown)
Line 1: Line 1:
{{Stub|Still unfinished! Need to add more qdiscs, more examples, fix orthography, etc.}}
 
 
[[Category:Networking]]
 
[[Category:Networking]]
{{Article summary start}}
+
The Linux kernel's network stack has network traffic control and shaping features. The {{ic|iproute2}} package installs the {{ic|tc}} command to control these via the command line.
{{Article summary text|This article gives an introduction to traffic shaping and control by using queueing disciplines. }}
+
{{Article summary heading|Related}}
+
{{Article summary text|After reading this, reading the "Linux Advanced Routing and Traffic Control" article is highly recommended}}
+
{{Article summary end}}
+
  
One of the advanced and least know network features from the linux kernel, is the ability to control and shape the traffic. While is common for users know the basic use of iproute2 by using the {{ic|ip}} command, is very common to ignore other powerful features like the ones offered by the {{ic|tc}} command.
+
The goal of this article is to show how to shape the traffic by using queueing disciplines. For instance, if you ever had to forbid downloads or torrents on a network that you admin, and not because you were against those services, but because users were "abusing" the bandwidth, then you could use queueing disciplines to allow that kind of traffic and, at the same time, be sure that one user cannot slowdown the entire network.
 
+
The goal of this article is to show how to shape the traffic by using queueing disciplines. For instance, if you ever had to forbid downloads or torrents on a network that you admin, and not because you were against those services, but because users were "abusing" the bandwidth, then you could use queueing disciplines to allow that kind of traffic and, at the same time, be sure that one user can't slowdown the entire network.
+
  
 
This is an advanced article; you are expected to have certain knowledge of network devices, iptables, etc.
 
This is an advanced article; you are expected to have certain knowledge of network devices, iptables, etc.
  
==Queueing==
+
== Queuing ==
Queuing controls how data is '''sent'''; we have no way to control how we receive data. However, since TCP/IP packages are sent using a slow start - that is, it starts sending the packages slow and keeps sending them faster and faster, until packages start getting rejected - is therefore possible to control how much traffic is received on a lan by dropping packages that arrive at a router before they get forwarded.
+
  
That is, we need to set the queueing disciplines (qdiscs for short) '''on the router''' to start dropping packages according to our rules ''on the network device connected to the lan'', '''NOT on the network device connected to the internet'''.
+
Queuing controls how data is '''sent'''; receiving data is much more reactive with fewer network-oriented controls. However, since TCP/IP packets are sent using a slow start the system starts sending the packets slow and keeps sending them faster and faster until packets start getting rejected - it is therefore possible to control how much traffic is received on a LAN by dropping packets that arrive at a router before they get forwarded. There are more relevant details, but they do not touch directly on queuing logic.
  
And in order to be the ones controlling the shape of the traffic, we need to be the slowest link of the chain; that is, if the connection has a maximum download speed of 500k, is a good idea to set the limit of the output to 450k. Otherwise, is going to be the modem shaping the traffic instead of us.
+
In order to be the ones fully controlling the shape of the traffic, we need to be the slowest link of the chain. That is, if the connection has a maximum download speed of 500k, if you do not limit of the output to 450k or below it is going to be the modem shaping the traffic instead of us.
  
Each network device has a ''root'' where a qdisc can be set. By default, this root has a fifo_fast qdisc. (more info below)
+
Each network device has a ''root'' where a qdisc can be set. This root has a fq_codel qdisc by default. (more info below)
  
There are two kind of diciplines: classful and classless.  
+
There are two kind of disciplines: classful and classless.  
  
Classful qdiscs allows you to create classes, which work like branches on a tree. You can then set rules to filter packages to each class. Each class can have assigned other classful or classless qdisc.
+
Classful qdiscs allow you to create classes, which work like branches on a tree. You can then set rules to filter packets into each class. Each class can itself have assigned other classful or classless qdisc.
  
Classless qdiscs does not allows to add more qdiscs to it.
+
Classless qdiscs do not allow to add more qdiscs to it.
  
 
Before starting to configure qdiscs, first we need to remove any existing qdisc from the root. This will remove any qdisc from the eth0 device:
 
Before starting to configure qdiscs, first we need to remove any existing qdisc from the root. This will remove any qdisc from the eth0 device:
  
  tc qdisc del dev eth0 root
+
  # tc qdisc del root dev eth0  
 +
 
 +
=== Classless Qdiscs ===
 +
 
 +
These are queues that do basic management of traffic by reordering, slowing or dropping packets. This qdiscs do not allow the creation of classes.
 +
 
 +
==== fifo_fast ====
 +
 
 +
This was the default qdisc up until systemd 217. In every network device where no custom qdisc configuration has been applied, fifo_fast is the qdisc set on the root.  fifo means ''First In First Out'', that is, the first packet to get in, is going to be the first to be sent. This way, no package gets special treatment.
 +
 
 +
==== Token Bucket Filter (TBF) ====
  
===Classless Qdiscs===
 
These are queues that do basic manage of traffic by reordering, slowing or dropping packets. This qdiscs don't allow the creation of classes.
 
====fifo_fast====
 
This is the default qdisc. In every network device where no custom qdisc configuration has been applied, fifo_fast is the qdisc set on the root.  fifo means ''First In First Out'', that is, the first package to get in, is going to be the first to be sent. This way, no package get special treatment.
 
====Token Bucket Filter (TBF) ====
 
 
This qdisc allows bytes to pass, as long certain rate limit is not passed.  
 
This qdisc allows bytes to pass, as long certain rate limit is not passed.  
  
It works by creating a virtual bucket and then dropping tokens at certain speed, filling that bucket. Each packages takes a virtual token from the bucket, and use it to get permission to pass. If too many packages arrive, the bucket will have no more tokens left and the remaining packages are going to wait certain time for new tokens. If the tokens don't arrive fast enough, the packages are going to be dropped. On the opposite case (too little packages to be sent), the tokens can be used to allow some burst (uploading pikes) to happen.  
+
It works by creating a virtual bucket and then dropping tokens at certain speed, filling that bucket. Each package takes a virtual token from the bucket, and uses it to get a permission to pass. If too many packets arrive, the bucket will have no more tokens left and the remaining packets are going to wait certain time for new tokens. If the tokens do not arrive fast enough, the packets are going to be dropped. On the opposite case (too few packets sent), the tokens can be used to allow some burst (uploading spikes) to happen.  
  
 
That means this qdisc is useful to slow down an interface.
 
That means this qdisc is useful to slow down an interface.
Line 45: Line 42:
 
Example:
 
Example:
  
Uploading can fill a modem's queue and as result, while you are uploading a huge file, the interactivity is destroyed.
+
Uploading can fill a modem's queue and, as result, while you are uploading a huge file, the interactivity is destroyed.
  
  tc qdisc add dev ppp0 root tbf rate 220kbit latency 50ms burst 1540         # The upload speed should be changed to your upload speed minus a small few percent (to be the slowest link of the chain)
+
  # tc qdisc add dev ppp0 root tbf rate 220kbit latency 50ms burst 1540
  
This configuration sets a TBF for the ppp0 device, limiting the upload speed to 220k, setting a latency of 50ms for a package before being dropped, and a burst of 1540.
+
Note the above upload speed should be changed to your upload speed minus a small few percent (to be the slowest link of the chain). This configuration sets a TBF for the {{ic|ppp0}} device, limiting the upload speed to 220k, setting a latency of 50ms for a package before being dropped, and a burst of 1540.
It works by keeping the queueing on the linux machine (where it can be shaped) instead of the modem.
+
It works by keeping the queueing on the Linux machine (where it can be shaped) instead of the modem.
  
====Stochastic Fairness Queueing (SFQ)====
+
==== Stochastic Fairness Queueing (SFQ) ====
This is a round-robin qdisc. Each conversation is set on a fifo queue, and on each round, each conversation has the possibility to send data. Thats why is called "Fairness"
+
 
Is also called "Stochastic" because it does not really creates a queue for each conversation, instead it uses a hashing algorithm. For the hash, there is the chance for multiple sessions on the same bucket. To solve this, SFQ changes its hashing algorithm often to prevent that this becomes noticeable.
+
This is a round-robin qdisc. Each conversation is set on a fifo queue, and on each round, each conversation has the possibility to send data. That is why it is called "Fairness".
 +
It is also called "Stochastic" because it does not really create a queue for each conversation, instead it uses a hashing algorithm. For the hash, there is a chance for multiple sessions on the same bucket. To solve this, SFQ changes its hashing algorithm often to prevent that this becomes noticeable.
  
 
Example:
 
Example:
Line 60: Line 58:
 
This configuration sets SFQ on the root on the eth0 device, configuring it to perturb (alter) its hashing algorithm every 10 seconds.
 
This configuration sets SFQ on the root on the eth0 device, configuring it to perturb (alter) its hashing algorithm every 10 seconds.
  
  tc qdisc add dev eth0 root sfq perturb 10
+
  # tc qdisc add dev eth0 root sfq perturb 10
  
===Classful Qdiscs===
+
==== CoDel and Fair Queueing CoDel ====
 +
 
 +
Since systemd 217, fq_codel is the default. [[Wikipedia:CoDel|CoDel]] (Controlled Delay) is an attempt to limit buffer bloating and minimize latency in saturated network links by distinguishing good queues (that empty quickly) from bad queues that stay saturated and slow. The [[Wikipedia:Fair_queueing|fair queueing]] Codel utilizes fair queues to more readily distribute available bandwidth between Codel flows. The configuration options are limited intentionally, since the algorithm is designed to work with dynamic networks, and there are some corner cases to consider that are discussed on the [http://www.bufferbloat.net/projects/codel/wiki bufferbloat wiki concerning Codel], including issues on very large switches and sub megabit connections.
 +
 
 +
Additional information is available via the {{ic|man tc-codel}} and {{ic|man tc-fq_codel}}.
 +
 
 +
{{Warning|Make sure your ethernet driver supports Byte Queue Limits before using CoDel. [http://www.bufferbloat.net/projects/bloat/wiki/BQL_enabled_drivers Here is a list of drivers supported as of kernel 3.6]}}
 +
 
 +
=== Classful Qdiscs ===
  
 
Classful qdiscs are very useful if you have different kinds of traffic which should have differing treatment.
 
Classful qdiscs are very useful if you have different kinds of traffic which should have differing treatment.
 
A classful qdisc allows you to have branches. The branches are called classes.
 
A classful qdisc allows you to have branches. The branches are called classes.
  
Setting a classful qdisc requires that you name each class. To name a class, is used the {{ic|classid}} parameter. The {{ic|parent}} parameter, as the name indicates, point to the parent of the class.
+
Setting a classful qdisc requires that you name each class. To name a class,the {{ic|classid}} parameter is used . The {{ic|parent}} parameter, as the name indicates, points to the parent of the class.
 
+
All the names should be set as {{ic|x:y}} where {{ic|x}} is the name of the root, and {{ic|y}} is the name of the class. Normally, root is called {{ic|1:}} and their children are things like {{ic|1:10}}
+
  
====Hierarchical Token Bucket (HTB)====
+
All the names should be set as {{ic|x:y}} where {{ic|x}} is the name of the root, and {{ic|y}} is the name of the class. Normally, the root is called {{ic|1:}} and its children are things like {{ic|1:10}}
  
 +
==== Hierarchical Token Bucket (HTB) ====
  
 
HTB is well suited for setups where you have a fixed amount of bandwidth which you want to divide for different purposes, giving each purpose a guaranteed bandwidth, with the possibility of specifying how much bandwidth can be borrowed.
 
HTB is well suited for setups where you have a fixed amount of bandwidth which you want to divide for different purposes, giving each purpose a guaranteed bandwidth, with the possibility of specifying how much bandwidth can be borrowed.
Here is an example with comments explaining what does each line:
+
Here is an example with comments explaining what each line does:
 
{{bc|
 
{{bc|
 
# This line sets a HTB qdisc on the root of eth0, and it specifies that the class 1:30 is used by default. It sets the name of the root as 1:, for future references.
 
# This line sets a HTB qdisc on the root of eth0, and it specifies that the class 1:30 is used by default. It sets the name of the root as 1:, for future references.
Line 95: Line 100:
  
 
# Martin Devera, author of HTB, then recommends SFQ for beneath these classes:
 
# Martin Devera, author of HTB, then recommends SFQ for beneath these classes:
tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10  
+
tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10
tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10  
+
tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10
tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10  
+
tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10
 
}}
 
}}
==Filters==
+
 
Once a classful qdisc is set on root (which may contain classes with more classful qdiscs), is necessary to use filters to indicate which package should be processed by which class.
+
== Filters ==
 +
 
 +
Once a classful qdisc is set on root (which may contain classes with more classful qdiscs), it is necessary to use filters to indicate which package should be processed by which class.
  
 
On a classless-only environment, filters are not necessary.
 
On a classless-only environment, filters are not necessary.
  
You can filter packages by using tc, or a combination of tc + iptables.
+
You can filter packets by using tc, or a combination of tc + iptables.
 +
 
 +
=== Using tc only ===
  
===Using tc only===
 
 
Here is an example explaining a filter:
 
Here is an example explaining a filter:
 
{{bc|
 
{{bc|
# This command adds a filter to the qdisc 1: of dev eth0, set the priority of the filter to 1, matches packages with a destination port 22, and make the class 1:10 process the packages that match
+
# This command adds a filter to the qdisc 1: of dev eth0, set the
 +
# priority of the filter to 1, matches packets with a
 +
# destination port 22, and make the class 1:10 process the
 +
# packets that match.
 
tc filter add dev eth0 protocol ip parent 1: prio 1 u32 match ip dport 22 0xffff flowid 1:10
 
tc filter add dev eth0 protocol ip parent 1: prio 1 u32 match ip dport 22 0xffff flowid 1:10
  
# This filter is attached to the qdisc 1: of dev eth0, has a priority of 2, and matches the ip address 4.3.2.1 exactly, and matches packages with a source port of 80, then makes class 1:11 process the packages that match
+
# This filter is attached to the qdisc 1: of dev eth0, has a
 +
# priority of 2, and matches the ip address 4.3.2.1 exactly, and
 +
# matches packets with a source port of 80, then makes class
 +
# 1:11 process the packets that match
 
tc filter add dev eth0 parent 1: protocol ip prio 2 u32 match ip src 4.3.2.1/32 match ip sport 80 0xffff flowid 1:11
 
tc filter add dev eth0 parent 1: protocol ip prio 2 u32 match ip src 4.3.2.1/32 match ip sport 80 0xffff flowid 1:11
 
}}
 
}}
===Using tc + iptables===
 
  
iptables has a method called fwmark, which can be used to add a mark to packages, a mark that can survive routing across interfaces.
+
=== Using tc + iptables ===
 +
 
 +
iptables has a method called fwmark that can be used to mark packets across interfaces.
 +
 
 +
First, this makes packets marked with 6, to be processed by the 1:30 class
 +
# tc filter add dev eth0 protocol ip parent 1: prio 1 handle 6 fw flowid 1:30
  
First, this makes packages marked with 6, to be processed by the 1:30 class
 
{{bc|
 
tc filter add dev eth0 protocol ip parent 1: prio 1 handle 6 fw flowid 1:30
 
}}
 
 
This sets that mark 6, using iptables
 
This sets that mark 6, using iptables
{{bc|
+
 
iptables -A PREROUTING -t mangle -i eth0 -j MARK --set-mark 6
+
# iptables -A PREROUTING -t mangle -i eth0 -j MARK --set-mark 6
}}
+
 
You can then use the regular way of iptables to match packages and then use fwmark to mark them.
+
You can then use iptables normally to match packets and then mark them with fwmark.
 +
 
 +
== Example of ingress traffic shaping with SNAT ==
 +
 
 +
{{Poor writing|non-standard AUR format}}
 +
 
 +
Qdiscs on ingress traffic provide only policing with no shaping. In order to shape ingress, the IFB (Intermediate Functional Block) device has to be used. However, another problem arises if SNAT or MASQUERADE is in use, as all incoming traffic has the same destination address. The Qdisc intercepts the incoming traffic on the external interface before reverse NAT translation so it can only see the router's IP as destination of the packets.
 +
 
 +
The following solution is implemented on OpenWRT and can be applied to Archlinux: First the outgoing packets are marked with MARK and the corresponding connections (and related connections) with CONNMARK. On the incoming packets an ingress u32 filter redirects the traffic to IFB (action mirred), and also retrieves the mark of the packet from CONNTRACK (action connmark) thus providing information as to which IP behind the NAT initiated the traffic).  
 +
 
 +
This function is integrated in kernel since {{Pkg|linux}}-3.19 and in {{Pkg|iproute2}} since 4.1.
 +
 
 +
The following is a small script with only 2 HTB classes on ingress to demonstrate it. Traffic defaults to class 3:30. Outgoing traffic from 192.168.1.50 (behind NAT) to the Internet is marked with "3" and thus incoming packets from the Internet going to 192.168.1.50 are marked also with "3" and are classified on 3:33.
 +
 
 +
{{bc|<nowiki>
 +
#!/bin/sh -x
 +
 
 +
# Maximum allowed downlink. Set to 90% of the achievable downlink in kbits/s
 +
DOWNLINK=1800
 +
 
 +
# Interface facing the Internet
 +
EXTDEV=enp0s3
 +
 
 +
# Load IFB, all other modules all loaded automatically
 +
modprobe ifb
 +
ip link set dev ifb0 down
 +
 
 +
# Clear old queuing disciplines (qdisc) on the interfaces and the MANGLE table
 +
tc qdisc del dev $EXTDEV root    2> /dev/null > /dev/null
 +
tc qdisc del dev $EXTDEV ingress 2> /dev/null > /dev/null
 +
tc qdisc del dev ifb0 root      2> /dev/null > /dev/null
 +
tc qdisc del dev ifb0 ingress    2> /dev/null > /dev/null
 +
iptables -t mangle -F
 +
iptables -t mangle -X QOS
 +
 
 +
# appending "stop" (without quotes) after the name of the script stops here.
 +
if [ "$1" = "stop" ]
 +
then
 +
        echo "Shaping removed on $EXTDEV."
 +
        exit
 +
fi
 +
 
 +
ip link set dev ifb0 up
 +
 
 +
# HTB classes on IFB with rate limiting
 +
tc qdisc add dev ifb0 root handle 3: htb default 30
 +
tc class add dev ifb0 parent 3: classid 3:3 htb rate ${DOWNLINK}kbit
 +
tc class add dev ifb0 parent 3:3 classid 3:30 htb rate 400kbit ceil ${DOWNLINK}kbit
 +
tc class add dev ifb0 parent 3:3 classid 3:33 htb rate 1400kbit ceil ${DOWNLINK}kbit
 +
 
 +
# Packets marked with "3" on IFB flow through class 3:33
 +
tc filter add dev ifb0 parent 3:0 protocol ip handle 3 fw flowid 3:33
 +
 
 +
# Outgoing traffic from 192.168.1.50 is marked with "3"
 +
iptables -t mangle -N QOS
 +
iptables -t mangle -A FORWARD -o $EXTDEV -j QOS
 +
iptables -t mangle -A OUTPUT -o $EXTDEV -j QOS
 +
iptables -t mangle -A QOS -j CONNMARK --restore-mark
 +
iptables -t mangle -A QOS -s 192.168.1.50 -m mark --mark 0 -j MARK --set-mark 3
 +
iptables -t mangle -A QOS -j CONNMARK --save-mark
 +
 
 +
# Forward all ingress traffic on internet interface to the IFB device
 +
tc qdisc add dev $EXTDEV ingress handle ffff:
 +
tc filter add dev $EXTDEV parent ffff: protocol ip \
 +
        u32 match u32 0 0 \
 +
        action connmark \
 +
        action mirred egress redirect dev ifb0 \
 +
        flowid ffff:1
 +
 
 +
 
 +
exit 0
 +
</nowiki>}}
 +
 
 +
== See also ==
 +
 
 +
* [http://lartc.org Linux Advanced Routing & Traffic Control]
 +
* [[wikipedia:Tc_(Linux)|Wikipedia page for the tc command]]

Latest revision as of 19:10, 12 August 2016

The Linux kernel's network stack has network traffic control and shaping features. The iproute2 package installs the tc command to control these via the command line.

The goal of this article is to show how to shape the traffic by using queueing disciplines. For instance, if you ever had to forbid downloads or torrents on a network that you admin, and not because you were against those services, but because users were "abusing" the bandwidth, then you could use queueing disciplines to allow that kind of traffic and, at the same time, be sure that one user cannot slowdown the entire network.

This is an advanced article; you are expected to have certain knowledge of network devices, iptables, etc.

Queuing

Queuing controls how data is sent; receiving data is much more reactive with fewer network-oriented controls. However, since TCP/IP packets are sent using a slow start the system starts sending the packets slow and keeps sending them faster and faster until packets start getting rejected - it is therefore possible to control how much traffic is received on a LAN by dropping packets that arrive at a router before they get forwarded. There are more relevant details, but they do not touch directly on queuing logic.

In order to be the ones fully controlling the shape of the traffic, we need to be the slowest link of the chain. That is, if the connection has a maximum download speed of 500k, if you do not limit of the output to 450k or below it is going to be the modem shaping the traffic instead of us.

Each network device has a root where a qdisc can be set. This root has a fq_codel qdisc by default. (more info below)

There are two kind of disciplines: classful and classless.

Classful qdiscs allow you to create classes, which work like branches on a tree. You can then set rules to filter packets into each class. Each class can itself have assigned other classful or classless qdisc.

Classless qdiscs do not allow to add more qdiscs to it.

Before starting to configure qdiscs, first we need to remove any existing qdisc from the root. This will remove any qdisc from the eth0 device:

# tc qdisc del root dev eth0 

Classless Qdiscs

These are queues that do basic management of traffic by reordering, slowing or dropping packets. This qdiscs do not allow the creation of classes.

fifo_fast

This was the default qdisc up until systemd 217. In every network device where no custom qdisc configuration has been applied, fifo_fast is the qdisc set on the root. fifo means First In First Out, that is, the first packet to get in, is going to be the first to be sent. This way, no package gets special treatment.

Token Bucket Filter (TBF)

This qdisc allows bytes to pass, as long certain rate limit is not passed.

It works by creating a virtual bucket and then dropping tokens at certain speed, filling that bucket. Each package takes a virtual token from the bucket, and uses it to get a permission to pass. If too many packets arrive, the bucket will have no more tokens left and the remaining packets are going to wait certain time for new tokens. If the tokens do not arrive fast enough, the packets are going to be dropped. On the opposite case (too few packets sent), the tokens can be used to allow some burst (uploading spikes) to happen.

That means this qdisc is useful to slow down an interface.

Example:

Uploading can fill a modem's queue and, as result, while you are uploading a huge file, the interactivity is destroyed.

# tc qdisc add dev ppp0 root tbf rate 220kbit latency 50ms burst 1540

Note the above upload speed should be changed to your upload speed minus a small few percent (to be the slowest link of the chain). This configuration sets a TBF for the ppp0 device, limiting the upload speed to 220k, setting a latency of 50ms for a package before being dropped, and a burst of 1540. It works by keeping the queueing on the Linux machine (where it can be shaped) instead of the modem.

Stochastic Fairness Queueing (SFQ)

This is a round-robin qdisc. Each conversation is set on a fifo queue, and on each round, each conversation has the possibility to send data. That is why it is called "Fairness". It is also called "Stochastic" because it does not really create a queue for each conversation, instead it uses a hashing algorithm. For the hash, there is a chance for multiple sessions on the same bucket. To solve this, SFQ changes its hashing algorithm often to prevent that this becomes noticeable.

Example:

This configuration sets SFQ on the root on the eth0 device, configuring it to perturb (alter) its hashing algorithm every 10 seconds.

# tc qdisc add dev eth0 root sfq perturb 10

CoDel and Fair Queueing CoDel

Since systemd 217, fq_codel is the default. CoDel (Controlled Delay) is an attempt to limit buffer bloating and minimize latency in saturated network links by distinguishing good queues (that empty quickly) from bad queues that stay saturated and slow. The fair queueing Codel utilizes fair queues to more readily distribute available bandwidth between Codel flows. The configuration options are limited intentionally, since the algorithm is designed to work with dynamic networks, and there are some corner cases to consider that are discussed on the bufferbloat wiki concerning Codel, including issues on very large switches and sub megabit connections.

Additional information is available via the man tc-codel and man tc-fq_codel.

Warning: Make sure your ethernet driver supports Byte Queue Limits before using CoDel. Here is a list of drivers supported as of kernel 3.6

Classful Qdiscs

Classful qdiscs are very useful if you have different kinds of traffic which should have differing treatment. A classful qdisc allows you to have branches. The branches are called classes.

Setting a classful qdisc requires that you name each class. To name a class,the classid parameter is used . The parent parameter, as the name indicates, points to the parent of the class.

All the names should be set as x:y where x is the name of the root, and y is the name of the class. Normally, the root is called 1: and its children are things like 1:10

Hierarchical Token Bucket (HTB)

HTB is well suited for setups where you have a fixed amount of bandwidth which you want to divide for different purposes, giving each purpose a guaranteed bandwidth, with the possibility of specifying how much bandwidth can be borrowed. Here is an example with comments explaining what each line does:

# This line sets a HTB qdisc on the root of eth0, and it specifies that the class 1:30 is used by default. It sets the name of the root as 1:, for future references.
tc qdisc add dev eth0 root handle 1: htb default 30

# This creates a class called 1:1, which is direct descendant of root (the parent is 1:), this class gets assigned also an HTB qdisc, and then it sets a max rate of 6mbits, with a burst of 15k
tc class add dev eth0 parent 1: classid 1:1 htb rate 6mbit burst 15k

# The previous class has this branches:

# Class 1:10, which has a rate of 5mbit
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit burst 15k

# Class 1:20, which has a rate of 3mbit
tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 6mbit burst 15k

# Class 1:30, which has a rate of 1kbit. This one is the default class.
tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 6mbit burst 15k

# Martin Devera, author of HTB, then recommends SFQ for beneath these classes:
tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10
tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10
tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10

Filters

Once a classful qdisc is set on root (which may contain classes with more classful qdiscs), it is necessary to use filters to indicate which package should be processed by which class.

On a classless-only environment, filters are not necessary.

You can filter packets by using tc, or a combination of tc + iptables.

Using tc only

Here is an example explaining a filter:

# This command adds a filter to the qdisc 1: of dev eth0, set the
# priority of the filter to 1, matches packets with a
# destination port 22, and make the class 1:10 process the
# packets that match.
tc filter add dev eth0 protocol ip parent 1: prio 1 u32 match ip dport 22 0xffff flowid 1:10

# This filter is attached to the qdisc 1: of dev eth0, has a
# priority of 2, and matches the ip address 4.3.2.1 exactly, and
# matches packets with a source port of 80, then makes class
# 1:11 process the packets that match
tc filter add dev eth0 parent 1: protocol ip prio 2 u32 match ip src 4.3.2.1/32 match ip sport 80 0xffff flowid 1:11

Using tc + iptables

iptables has a method called fwmark that can be used to mark packets across interfaces.

First, this makes packets marked with 6, to be processed by the 1:30 class

# tc filter add dev eth0 protocol ip parent 1: prio 1 handle 6 fw flowid 1:30

This sets that mark 6, using iptables

# iptables -A PREROUTING -t mangle -i eth0 -j MARK --set-mark 6

You can then use iptables normally to match packets and then mark them with fwmark.

Example of ingress traffic shaping with SNAT

Tango-edit-clear.pngThis article or section needs language, wiki syntax or style improvements.Tango-edit-clear.png

Reason: non-standard AUR format (Discuss in Talk:Advanced traffic control#)

Qdiscs on ingress traffic provide only policing with no shaping. In order to shape ingress, the IFB (Intermediate Functional Block) device has to be used. However, another problem arises if SNAT or MASQUERADE is in use, as all incoming traffic has the same destination address. The Qdisc intercepts the incoming traffic on the external interface before reverse NAT translation so it can only see the router's IP as destination of the packets.

The following solution is implemented on OpenWRT and can be applied to Archlinux: First the outgoing packets are marked with MARK and the corresponding connections (and related connections) with CONNMARK. On the incoming packets an ingress u32 filter redirects the traffic to IFB (action mirred), and also retrieves the mark of the packet from CONNTRACK (action connmark) thus providing information as to which IP behind the NAT initiated the traffic).

This function is integrated in kernel since linux-3.19 and in iproute2 since 4.1.

The following is a small script with only 2 HTB classes on ingress to demonstrate it. Traffic defaults to class 3:30. Outgoing traffic from 192.168.1.50 (behind NAT) to the Internet is marked with "3" and thus incoming packets from the Internet going to 192.168.1.50 are marked also with "3" and are classified on 3:33.

#!/bin/sh -x

# Maximum allowed downlink. Set to 90% of the achievable downlink in kbits/s
DOWNLINK=1800

# Interface facing the Internet
EXTDEV=enp0s3

# Load IFB, all other modules all loaded automatically
modprobe ifb
ip link set dev ifb0 down

# Clear old queuing disciplines (qdisc) on the interfaces and the MANGLE table
tc qdisc del dev $EXTDEV root    2> /dev/null > /dev/null
tc qdisc del dev $EXTDEV ingress 2> /dev/null > /dev/null
tc qdisc del dev ifb0 root       2> /dev/null > /dev/null
tc qdisc del dev ifb0 ingress    2> /dev/null > /dev/null
iptables -t mangle -F
iptables -t mangle -X QOS

# appending "stop" (without quotes) after the name of the script stops here.
if [ "$1" = "stop" ]
then
        echo "Shaping removed on $EXTDEV."
        exit
fi

ip link set dev ifb0 up

# HTB classes on IFB with rate limiting
tc qdisc add dev ifb0 root handle 3: htb default 30
tc class add dev ifb0 parent 3: classid 3:3 htb rate ${DOWNLINK}kbit
tc class add dev ifb0 parent 3:3 classid 3:30 htb rate 400kbit ceil ${DOWNLINK}kbit
tc class add dev ifb0 parent 3:3 classid 3:33 htb rate 1400kbit ceil ${DOWNLINK}kbit

# Packets marked with "3" on IFB flow through class 3:33
tc filter add dev ifb0 parent 3:0 protocol ip handle 3 fw flowid 3:33

# Outgoing traffic from 192.168.1.50 is marked with "3"
iptables -t mangle -N QOS
iptables -t mangle -A FORWARD -o $EXTDEV -j QOS
iptables -t mangle -A OUTPUT -o $EXTDEV -j QOS
iptables -t mangle -A QOS -j CONNMARK --restore-mark
iptables -t mangle -A QOS -s 192.168.1.50 -m mark --mark 0 -j MARK --set-mark 3
iptables -t mangle -A QOS -j CONNMARK --save-mark

# Forward all ingress traffic on internet interface to the IFB device
tc qdisc add dev $EXTDEV ingress handle ffff:
tc filter add dev $EXTDEV parent ffff: protocol ip \
        u32 match u32 0 0 \
        action connmark \
        action mirred egress redirect dev ifb0 \
        flowid ffff:1


exit 0

See also