-
Notifications
You must be signed in to change notification settings - Fork 0
ACLs
It is possible to offload TC flower rules with a limited set of keys and
actions to netdevs which represent mlxsw
ports.
Before configuring match rules on enp3s0np1
, one must first create the
queueing disciplines (qdiscs) to which the flower
classifier is
attached.
Note: Offloading is not yet supported for soft-netdevs (e.g. bridge, bond, VLAN) or the management port.
Note: For now, offloading is only supported for netdevs which are bridged or have an IPv4 address assigned.
In order to prepare for the addition of flower rules, either add the
ingress qdisc or clsact qdisc to enp3s0np1
:
$ tc qdisc add dev enp3s0np1 ingress
Or:
$ tc qdisc add dev enp3s0np1 clsact
The benefit of clsact qdisc is that it can be used for insertion of not only ingress rules, but also egress rules.
The rest of the examples here use the ingress qdisc. To see more examples using clsact qdisc, please see the More Examples section.
- protocol (ethertype) [4.11]
- src_mac [4.11]
- dst_mac [4.11]
- src_ip (both IPv4 and IPv6) [4.11]
- dst_ip (both IPv4 and IPv6) [4.11]
- ip_proto ("tcp" and "udp") [4.11]
- src_port [4.11]
- dst_port [4.11]
- vlan_prio [4.12]
- vlan_id (ingress direction) [4.12]
- tcp_flags [4.13]
- ip_ttl [4.14]
- ip_tos [4.14]
Note: Packets arriving without 802.1q TCI, or ones which are only priority-tagged, are assigned a bridge PVID by the hardware. Thus, a flower match on a vlan_id of PVID will match untagged packets as well.
- drop [4.11]
- mirred egress redirect (forward) [4.11]
- mirred egress mirror [4.16]
- vlan modify [4.12]
- trap [4.13]
- goto chain [4.14]
- pass [4.15]
Note: Packets which arrive without 802.1q TCI, or which are only priority-tagged, are assigned a bridge PVID by the hardware. Thus, a "vlan modify" to a non-PVID tag apparently pushes a VLAN tag on such packet, and likewise "vlan modify" to a PVID tag pops it. That is unlike the software pipeline, where "vlan modify" is only meaningful on packets which are already 802.1q-tagged.
$ tc filter add dev enp3s0np1 parent ffff: protocol ipv6 pref 2 flower skip_sw src_ip fe01::1 action drop
This would add a rule with priority 2
matching every IPv6 packet with the source
address fe01::1
. The selected action is drop
. Note the parameter skip_sw
which instructs TC to skip the insertion of the rule to the kernel's datapath.
If this keyword is omitted, the rule is inserted in both the kernel and HW.
To see a list of inserted rules, run:
$ tc filter show dev enp3s0np1 root
In order to observe statistics related to packets, bytes transmitted, or
last time used, which are maintained on a per rule basis, add the -s
flag:
$ tc -s filter show dev enp3s0np1 root
$ tc filter add dev enp3s0np1 parent ffff: protocol ipv6 pref 2 flower skip_sw src_ip fe01::1 action pass
This adds a rule with priority 2
matching every IPv6 packet with the source
address fe01::1
. The selected action is pass
. The result is that matching
packets are accepted and processing of further filters is avoided.
$ tc filter add dev enp3s0np1 parent ffff: protocol ipv6 pref 2 flower skip_sw src_ip fe01::1 action trap
This adds a rule with priority 2
matching every IPv6 packet with the source
address fe01::1
. The selected action is trap
.
This rule insertion instructs the hardware to send matched packets to
the kernel which may then perform further analysis on them. They appear
as if they come from device enp3s0np1
.
TC rules (filters) are put together into chains by order of priority (pref). Each chain can be looked at as a table of rules.
To insert a rule into a specific chain, one has to use the chain
parameter:
$ tc filter add dev enp3s0np1 parent ffff: protocol ip chain 100 pref 10 flower skip_sw dst_ip 192.168.101.1 action drop
In this example, we added the rule into chain 100
. If the chain parameter
is omitted, the default chain 0 is assumed. Chain 0 is also the chain which
is always processed first. If we want other chains to be processed, we have
to use the action goto chain
:
$ tc filter add dev enp3s0np1 parent ffff: protocol ip pref 10 flower skip_sw dst_ip 192.168.101.1 action goto chain 100
If a chain does not exist before a filter is added, it is implicitly created. Similarly, after the last filter is removed, implicitly created chains are destroyed. It is possible to explicitly create and destroy chains.
To create chain 11
, run the following command:
$ tc chain add dev enp3s0np1 ingress chain 11
To list existing chains, run:
$ tc chain show dev enp3s0np1 ingress
chain parent ffff: chain 11
To destroy chain 11
, run:
$ tc chain del dev enp3s0np1 ingress chain 11
Note: The above command will will delete both implicitly and explicitly created chains along with any existing filters.
For filter insertions to chains, the mlxsw
driver needs to hold a
magic ball. With the first inserted rule into hardware it needs to guess
all the fields that are going to be used for the matching in the chain.
If later on this guess proves to be wrong and user adds a filter with
different fields to match, there is a problem. mlxsw
resolves it now
with couple of predefined patterns. Those try to cover as many match
fields as possible. This approach is far from optimal, both
performance-wise and scale-wise. Also, the insertion of certain filters
might fail, depending on the insertion order.
Most of the time, when user inserts filters in chain, he knows how the filters are going to look like in advance - what type and option will they have. For example, it is possible that the user knows that only filters of type flower matching on destination IP are required. The user can specify a template that would cover all the filters which are going to be inserted in the chain.
The template is passed along during the chain creation like this:
$ tc chain add dev enp3s0np1 ingress proto ip chain 11 flower dst_ip 0.0.0.0/16
The template is then shown when listing chains:
$ tc chain show dev enp3s0np1 ingress
chain parent ffff: flower chain 11
eth_type ipv4
dst_ip 0.0.0.0/16
Addition of filters that fit the template will be successful:
$ tc filter add dev enp3s0np1 ingress proto ip chain 11 flower dst_ip 10.0.0.1/8 action drop
Addition of filters that do not fit the template will fail:
$ tc filter add dev enp3s0np1 ingress proto ip chain 11 flower dst_ip 10.0.0.1/24 action drop
Error: cls_flower: Mask does not fit the template.
We have an error talking to the kernel, -1
$ tc filter add dev enp3s0np1 parent ffff: protocol ipv6 pref 2 flower skip_sw src_ip fe01::1 action mirred egress (mirror|redirect) dev enp3s0np2
This adds a rule with priority 2
matching every IPv6 packet with the source
address fe01::1
. The selected action is mirred
.
This rule insertion instructs the hardware to redirect/mirror matched packet to the specified interface, enp3s0np2 in the example.
By default, each qdisc has its own group of chains (each contains filters).
This group of chains is called block
. For example for ingress
qdisc the mapping between netdev:qdisc:block is 1:1:1.
But consider a case when you have 2 netdevices, you create ingress qdisc on both. Now if you want to add identical set of filter rules to both, you need to add them twice. One for each netdev:qdisc:block. That is of course doable, but when the filters are offloaded to TCAM with limited number of entries, the duplications may become a scale issue. Sharing of blocks aims to resolve that.
In order to ask kernel to share blocks, one has to indicate so during qdisc creation:
$ tc qdisc add dev enp3s0np1 ingress_block 22 ingress
$ tc qdisc add dev enp3s0np2 ingress_block 22 ingress
These two commands added ingress qdiscs to both netdevices. Note the
ingress_block
option that indicates that both qdiscs should share the same
block identified by index 22
. It is up to the user to choose the
block index.
If you list the existing qdiscs, you see the block sharing info in the output:
$ tc qdisc
qdisc ingress ffff: dev enp3s0np1 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev enp3s0np2 parent ffff:fff1 ingress_block 22
To make it more visual, the situation looks like this:
enp3s0np1 ingress qdisc enp3s0np2 ingress qdisc
| |
| |
+----------> block 22 <----------+
There is no limitation in number of qdiscs that can share the same block.
Once the qdisc block is shared, it is no longer possible to manipulate the filters using the qdisc handle. One has to rather use the block index as a handle:
$ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
Aside of the ingress
qdisc, the block sharing is also supported for clsact
qdisc. For that, user can decide to share ingress and egress block:
$ tc qdisc add dev enp3s0np3 ingress_block 23 egress_block 24 clsact
$ tc filter add dev enp3s0np1 parent ffff: protocol ip pref 20 flower skip_sw dst_mac f4:52:14:10:df:92 action mirred egress redirect dev enp3s0np19
$ tc filter add dev enp3s0np1 parent ffff: protocol ipv6 pref 10 flower skip_sw dst_ip fe01::3 ip_proto tcp dst_port 3333 action drop
$ tc filter add dev enp3s0np1 parent ffff: protocol 802.1q flower vlan_id 95 skip_sw action drop
$ tc filter add dev enp3s0np1 parent ffff: protocol all flower action vlan modify id 85
Using clsact qdisc:
$ tc filter add dev enp3s0np1 ingress protocol ip pref 10 flower skip_sw dst_ip 192.168.101.1 action trap
$ tc filter add dev enp3s0np1 egress protocol ip pref 10 flower skip_sw dst_ip 192.168.101.3 action drop
- man tc
- man tc-flower
-
QoS in Linux with TC and Filters by Phil Sutter (part of
iproute
documentation)
Installation
System Maintenance
Network Interface Configuration
- Switch Port Configuration
- Persistent Configuration
- Quality of Service
- Queues Management
- Port Mirroring
- ACLs
- OVS
- Resource Management
Layer 2
Network Virtualization
Layer 3
- Static Routing
- Virtual Routing and Forwarding (VRF)
- Tunneling
- Multicast Routing
- Virtual Router Redundancy Protocol (VRRP)
Debugging