-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable disabling TX checksum offload for Antrea host gateway #6843
base: main
Are you sure you want to change the base?
Enable disabling TX checksum offload for Antrea host gateway #6843
Conversation
e92e5ab
to
1fab9f9
Compare
# If this option is later set to false, for the host gateway interface, Antrea will not restore its | ||
# original TX checksum state, as Antrea does not retain the original configuration. Users are responsible | ||
# for manually reconfiguring the setting if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it the same for container network interfaces? I feel it could say the option affects newly created interfaces only when it's set to true, and does nothing when it's set to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is the same for container network interfaces. I'll optimize the sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, it's a little different from container network interface. Currently, we will disable TX checksum on existing antrea-gw0 as well as newly created antrea-gw0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
# If this option is later set to false, Antrea does nothing to the affected container network interfaces
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces,
# it is recommended to delete them and recreate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we should probably update docs/antrea-l7-network-policy.md
as well
Do you think there is any way to check for the checksum issue described in #6806 in our e2e L7NetworkPolicy tests?
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces, | ||
# it is recommended to delete them and recreate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.
For Pods, yes. For the gateway interface, I don't know if this is the best advice as it is a bit disruptive and unpractical for users. We could consider an antctl command (maybe in the future) to change the offload configuration on all Nodes easily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. I will add the sub command in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the mean time, I would recommend not including that sentence in the description (To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.
) as we do not provide useful guidance on how to do it practically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
3268120
to
09655e1
Compare
Updated
I got an idea like this: Assuming that there are two Nodes,
We could check the log to verify the issue described in #6806. If we disable antrea-gw0 TX checksumming, we should receive such log. One more thing, we cannot verify the issue with the result of command When I tried to verify the idea, I got some troubles in step 4:
The rejected packet logged above, which is encoded in base64, is part of the HTTP request. It seems that there is something wrong with Suricata TCP reassembly. I have inspected the related options, which should be qualified to handle the HTTP request with 2000 bytes data. These are the options in stream:
memcap: 64mb
#memcap-policy: ignore
checksum-validation: yes # reject incorrect csums
#midstream: false
#midstream-policy: ignore
inline: auto # auto will use inline mode in IPS mode, yes or no set it statically
reassembly:
memcap: 256mb
#memcap-policy: ignore
depth: 1mb # reassemble 1mb into a stream
toserver-chunk-size: 2560
toclient-chunk-size: 2560
randomize-chunk-size: yes
#randomize-chunk-range: 10
#raw: yes
#segment-prealloc: 2048
#check-overlap-different-data: true I'm trying to find some options in Suricata to resolve the issue above. I would greatly appreciate any ideas or inspiration you could offer regarding the issue. @antoninbas @tnqn |
Found the root cause of the above issue and added a PR #6857 to fix it. |
@hongliangl for the test, can we generate a query with a large reply (larger than MTU) but a small request payload (smaller than MTU), so that we don't hit the new issue (#6857)? Maybe the
Probably better if we don't assume a specific MTU value (1500 -> 2000) and we can determine the MTU dynamically. |
09655e1
to
11fab3f
Compare
11fab3f
to
90b5065
Compare
@antoninbas Sorry for not replying this late. The command you provided is a feasible way to mock the expected HTTP request! I have added an e2e new test using it. |
# and the host gateway interface. To restore the default TX checksum state of the affected interfaces, | ||
# it is recommended to delete them and recreate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the mean time, I would recommend not including that sentence in the description (To restore the default TX checksum state of the affected interfaces, it is recommended to delete them and recreate.
) as we do not provide useful guidance on how to do it practically.
test/e2e/l7networkpolicy_test.go
Outdated
// Run the command that makes the test server send an HTTP response with a body larger than the MTU on the | ||
// test client Pod. | ||
testBodySize := mtu * 2 | ||
cmd = []string{"sh", "-c", fmt.Sprintf(`curl --data-urlencode "cmd=head -c %d </dev/zero | tr '\0' 'A'" http://%s/shell?cmd`, testBodySize, baseURL)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to wrap this command using a shell invocation. Does the following work?
cmd := []string{"curl", "--data-urlencode", fmt.Sprintf("cmd=head -c %d </dev/zero | tr '\0' 'A'", testBodySize), "http://%s/shell?cmd"}
Note that while the cmd
parameter is a shell command, it will be executed on the server, not on the client Pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have verified this cmd manually and it can work.
90b5065
to
1b85094
Compare
358ffaa
to
36084f7
Compare
36084f7
to
3214e37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more nit, otherwise LGTM
test/e2e/l7networkpolicy_test.go
Outdated
} | ||
// Create a test egress L7NetworkPolicy allowing HTTP path "shell*". | ||
policyAllowPathShellName := "test-l7-http-allow-path-shell" | ||
createL7NetworkPolicy(t, data, false, policyAllowPathShellName, 1, nil, clientPodLabels, ProtocolTCP, p8081, l7ProtocolAllowsPathShell) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use "8081"
instead of p8081
I think using a constant provides no value here, especially when you consider the fact that you "hardcoded" 8081
at line 309
BTW, is there any reason to use port 8081 here while testL7NetworkPolicyHTTP
uses port 8080?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No special reason. Just because that the Pod is an hostNetwork one, and the port 8080 is potentially used by other services in the Node where it resides.
test/e2e/l7networkpolicy_test.go
Outdated
require.NoError(t, err) | ||
|
||
for _, ip := range serverIPs { | ||
baseURL := net.JoinHostPort(ip.String(), strconv.Itoa(int(p8081))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, you can use "8081"
instead of strconv.Itoa(int(p8081))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
3214e37
to
f037c48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
@hongliangl I merged a bunch of PRs for 2.3 and looks like there is a conflict now, could you resolve it? |
This commit introduces the ability to disable TX checksum offload for the host gateway interface (default: `antrea-gw0`) by setting the `disableTXChecksumOffload` option to `true`. If this option is later set to false, Antrea does nothing to the affected container network interfaces and the host gateway interface. Signed-off-by: Hongliang Liu <[email protected]>
f037c48
to
f9bf7f5
Compare
/test-all |
Fixes: #6806
This commit introduces the ability to disable TX checksum offload for the host gateway interface (default:
antrea-gw0
) by setting thedisableTXChecksumOffload
option totrue
.Note: If this option is later set to false, Antrea will not restore the original TX checksum state, as it does not retain the original configuration. Users are responsible for manually reconfiguring the setting if needed.