Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable disabling TX checksum offload for Antrea host gateway #6843

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Dec 4, 2024

Fixes: #6806

This commit introduces the ability to disable TX checksum offload for the host gateway interface (default: antrea-gw0) by setting the disableTXChecksumOffload option to true.

Note: If this option is later set to false, Antrea will not restore the original TX checksum state, as it does not retain the original configuration. Users are responsible for manually reconfiguring the setting if needed.

@hongliangl hongliangl requested review from antoninbas and tnqn December 4, 2024 10:13
@hongliangl hongliangl added this to the Antrea v2.3 release milestone Dec 4, 2024
@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch from e92e5ab to 1fab9f9 Compare December 5, 2024 10:46
Comment on lines 185 to 187
# If this option is later set to false, for the host gateway interface, Antrea will not restore its
# original TX checksum state, as Antrea does not retain the original configuration. Users are responsible
# for manually reconfiguring the setting if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the same for container network interfaces? I feel it could say the option affects newly created interfaces only when it's set to true, and does nothing when it's set to false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the same for container network interfaces. I'll optimize the sentences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, it's a little different from container network interface. Currently, we will disable TX checksum on existing antrea-gw0 as well as newly created antrea-gw0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.

pkg/agent/agent_linux.go Outdated Show resolved Hide resolved
@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Dec 9, 2024
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we should probably update docs/antrea-l7-network-policy.md as well

Do you think there is any way to check for the checksum issue described in #6806 in our e2e L7NetworkPolicy tests?

Comment on lines 186 to 187
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.

For Pods, yes. For the gateway interface, I don't know if this is the best advice as it is a bit disruptive and unpractical for users. We could consider an antctl command (maybe in the future) to change the offload configuration on all Nodes easily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. I will add the sub command in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the mean time, I would recommend not including that sentence in the description (To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.) as we do not provide useful guidance on how to do it practically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch 2 times, most recently from 3268120 to 09655e1 Compare December 12, 2024 08:27
@hongliangl
Copy link
Contributor Author

hongliangl commented Dec 12, 2024

LGTM, we should probably update docs/antrea-l7-network-policy.md as well

Updated

Do you think there is any way to check for the checksum issue described in #6806 in our e2e L7NetworkPolicy tests?

I got an idea like this:

Assuming that there are two Nodes, controlplane and worker.

  1. Create a hostNetwork agnhost Pod on Node worker.
  2. Create a non-hostNetwork busybox Pod installed curl on Node controlplane.
  3. Create a L7 NetworkPolicy applied to the above busybox Pod. This policy is used to enforce the connections between the busybox Pod and the agnhost Pod to be processed by Suricata on Node controlplane.
  4. Run command curl http://192.168.77.101/echo?msg=$(head -c 2000 </dev/zero | tr '\0' 'A') on busybox Pod. This command is used to get a reply oversized packets larger than MTU.
  5. If antrea-gw0 TX checksumming is on:
    • The packet larger than MTU will be sent to Suricata via antrea-l7-tap0.
    • The packet larger than MTU will be sent back to OVS via antrea-l7-tap1. Since the size is larger than MTU and TX offloading of antrea-l7-tap1 is off, log like 2024-12-12 08:13:59 Warning: af-packet: antrea-l7-tap1: sending packet failed on socket 22: Message too long will be generated in /var/log/antrea/networkpolicy/l7engine/suricata.log in Node controlplane.

We could check the log to verify the issue described in #6806. If we disable antrea-gw0 TX checksumming, we should receive such log.

One more thing, we cannot verify the issue with the result of command curl http://192.168.77.101/echo?msg=$(head -c 2000 </dev/zero | tr '\0' 'A') because it will not fail due to the retransmission of TCP (the agnhost server will send small packets when retransmitting).

When I tried to verify the idea, I got some troubles in step 4:

  • The data of HTTP GET is larger (2000 bytes) than MTU, and the data will be encapsulated into two packets.
  • The packets can be received by Suricata successfully via antrea-l7-tap`.
  • The connection will get a refuse, and we will get the log like the following in Suricata log eve-2024-12-12.json:
    {"timestamp":"2024-12-12T08:59:10.650212+0000","flow_id":1943321052111352,"in_iface":"antrea-l7-tap0","event_type":"alert","vlan":[2],"src_ip":"10.10.0.8","src_port":47414,"dest_ip":"192.168.77.101","dest_port":80,"proto":"TCP","pkt_src":"wire/pcap","tenant_id":2,"alert":{"action":"blocked","gid":1,"signature_id":1,"rev":0,"signature":"Reject by AntreaNetworkPolicy:default/egress-allow-http","category":"","severity":3,"tenant_id":2},"app_proto":"http","direction":"to_server","flow":{"pkts_toserver":3,"pkts_toclient":1,"bytes_toserver":1616,"bytes_toclient":78,"start":"2024-12-12T08:59:10.649072+0000","src_ip":"10.10.0.8","dest_ip":"192.168.77.101","src_port":47414,"dest_port":80},"packet":"ugwcu18C8njWJ8a6gQAAAggARQAFqq/HQABABm1nCgoACMCoTWW5NgBQewpZO6SagdOAEAH7wSoAAAEBCArkxft3d1ezqkdFVCAvZWNobz9tc2c9QUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQQ==","packet_info":{"linktype":1}}
    

The rejected packet logged above, which is encoded in base64, is part of the HTTP request.

It seems that there is something wrong with Suricata TCP reassembly. I have inspected the related options, which should be qualified to handle the HTTP request with 2000 bytes data. These are the options in /etc/suricata/suricata.yaml:

stream:
  memcap: 64mb
  #memcap-policy: ignore
  checksum-validation: yes      # reject incorrect csums
  #midstream: false
  #midstream-policy: ignore
  inline: auto                  # auto will use inline mode in IPS mode, yes or no set it statically
  reassembly:
    memcap: 256mb
    #memcap-policy: ignore
    depth: 1mb                  # reassemble 1mb into a stream
    toserver-chunk-size: 2560
    toclient-chunk-size: 2560
    randomize-chunk-size: yes
    #randomize-chunk-range: 10
    #raw: yes
    #segment-prealloc: 2048
    #check-overlap-different-data: true

I'm trying to find some options in Suricata to resolve the issue above. I would greatly appreciate any ideas or inspiration you could offer regarding the issue. @antoninbas @tnqn

@hongliangl
Copy link
Contributor Author

Found the root cause of the above issue and added a PR #6857 to fix it.

@antoninbas
Copy link
Contributor

@hongliangl for the test, can we generate a query with a large reply (larger than MTU) but a small request payload (smaller than MTU), so that we don't hit the new issue (#6857)? Maybe the /shell agnhost endpoint will work instead of /echo?
It's also cleaner since you are using a shell command to generate the message anyway:

curl --data-urlencode "cmd=head -c 2000 </dev/zero | tr '\0' 'A'" http://192.168.77.101//shell?cmd

Probably better if we don't assume a specific MTU value (1500 -> 2000) and we can determine the MTU dynamically.

@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch from 09655e1 to 11fab3f Compare January 27, 2025 08:27
@hongliangl hongliangl requested a review from antoninbas January 27, 2025 08:29
@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch from 11fab3f to 90b5065 Compare January 27, 2025 08:49
@hongliangl
Copy link
Contributor Author

@hongliangl for the test, can we generate a query with a large reply (larger than MTU) but a small request payload (smaller than MTU), so that we don't hit the new issue (#6857)? Maybe the /shell agnhost endpoint will work instead of /echo? It's also cleaner since you are using a shell command to generate the message anyway:

curl --data-urlencode "cmd=head -c 2000 </dev/zero | tr '\0' 'A'" http://192.168.77.101//shell?cmd

Probably better if we don't assume a specific MTU value (1500 -> 2000) and we can determine the MTU dynamically.

@antoninbas Sorry for not replying this late. The command you provided is a feasible way to mock the expected HTTP request! I have added an e2e new test using it.

test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
Comment on lines 186 to 187
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the mean time, I would recommend not including that sentence in the description (To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.) as we do not provide useful guidance on how to do it practically.

pkg/agent/agent_linux.go Outdated Show resolved Hide resolved
// Run the command that makes the test server send an HTTP response with a body larger than the MTU on the
// test client Pod.
testBodySize := mtu * 2
cmd = []string{"sh", "-c", fmt.Sprintf(`curl --data-urlencode "cmd=head -c %d </dev/zero | tr '\0' 'A'" http://%s/shell?cmd`, testBodySize, baseURL)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to wrap this command using a shell invocation. Does the following work?

cmd := []string{"curl", "--data-urlencode", fmt.Sprintf("cmd=head -c %d </dev/zero | tr '\0' 'A'", testBodySize), "http://%s/shell?cmd"}

Note that while the cmd parameter is a shell command, it will be executed on the server, not on the client Pod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified this cmd manually and it can work.

@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch from 90b5065 to 1b85094 Compare February 4, 2025 08:19
test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
test/e2e/l7networkpolicy_test.go Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch 3 times, most recently from 358ffaa to 36084f7 Compare February 6, 2025 05:11
@hongliangl hongliangl requested a review from antoninbas February 7, 2025 02:22
test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch from 36084f7 to 3214e37 Compare February 8, 2025 06:31
@hongliangl hongliangl requested a review from antoninbas February 8, 2025 06:32
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more nit, otherwise LGTM

}
// Create a test egress L7NetworkPolicy allowing HTTP path "shell*".
policyAllowPathShellName := "test-l7-http-allow-path-shell"
createL7NetworkPolicy(t, data, false, policyAllowPathShellName, 1, nil, clientPodLabels, ProtocolTCP, p8081, l7ProtocolAllowsPathShell)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use "8081" instead of p8081
I think using a constant provides no value here, especially when you consider the fact that you "hardcoded" 8081 at line 309

BTW, is there any reason to use port 8081 here while testL7NetworkPolicyHTTP uses port 8080?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No special reason. Just because that the Pod is an hostNetwork one, and the port 8080 is potentially used by other services in the Node where it resides.

require.NoError(t, err)

for _, ip := range serverIPs {
baseURL := net.JoinHostPort(ip.String(), strconv.Itoa(int(p8081)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, you can use "8081" instead of strconv.Itoa(int(p8081))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@hongliangl hongliangl force-pushed the 20241202-disable-tx-antrea-gw0 branch from 3214e37 to f037c48 Compare February 11, 2025 02:57
antoninbas
antoninbas previously approved these changes Feb 11, 2025
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor

/test-all

@antoninbas
Copy link
Contributor

@hongliangl I merged a bunch of PRs for 2.3 and looks like there is a conflict now, could you resolve it?

This commit introduces the ability to disable TX checksum offload
for the host gateway interface (default: `antrea-gw0`) by setting the
`disableTXChecksumOffload` option to `true`.

If this option is later set to false, Antrea does nothing to the affected
container network interfaces and the host gateway interface.

Signed-off-by: Hongliang Liu <[email protected]>
@antoninbas
Copy link
Contributor

/test-all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Very low throughput when using L7NetworkPolicies with an external host
3 participants