E2E: Redundancy tests failing on OCP 4.17+ #3310

vthapar · 2025-02-19T05:05:58Z

What happened:
Redundancy tests that restart a node are failing with timeout error wiht OCP 4.17+

What you expected to happen:
Redundancy tests should pass

How to reproduce it (as minimally and precisely as possible):
Install submariner on OCP 4.17+ clusters and run subctl verify with gateway-failover tests enabled.

Anything else we need to know?:
Submariner 0.19 + OCP 4.17+

Environment:

Diagnose information (use subctl diagnose all):
Gather information (use subctl gather):
Cloud provider or hardware configuration:
Install tools:
Others:

Issue is with the code to restart the node in tests. With 4.17+ it keeps returning Timeout error even if the node is restarted. This causes test to fail with timeout error after all retries, even though node restarted and failover occurs correctly.

The text was updated successfully, but these errors were encountered:

vthapar · 2025-02-19T14:44:18Z

surprisingly, not seeing the issue with 0.20

Feb 19 19:17:46.889: Feb 19 19:17:46.889: INFO: ExecWithOptions &{Command:[sh -c echo 1 > /proc/sys/kernel/sysrq && echo b > /proc/sysrq-trigger] Namespace:submariner-operator PodName:submariner-gateway-m7lwk ContainerName:submariner-gateway Stdin:<nil> CaptureStdout:false CaptureStderr:true PreserveWhitespace:false}

Feb 19 19:18:23.704: Feb 19 19:18:23.704: INFO: Retrying due to error  Timeout occurred

Feb 19 19:18:29.676: Feb 19 19:18:29.676: INFO: Retrying due to error  unable to upgrade connection: container not found ("submariner-gateway")

Feb 19 19:19:07.486: Feb 19 19:19:07.486: INFO: Retrying due to error  Timeout occurred

Feb 19 19:19:13.507: Feb 19 19:19:13.507: INFO: Retrying due to error  error dialing backend: dial tcp 10.0.96.138:10250: connect: connection refused

Feb 19 19:19:20.654: Feb 19 19:19:20.654: INFO: Retrying due to error  unable to upgrade connection: container not found ("submariner-gateway")

Feb 19 19:19:20.654: Successfully crashed gateway node "ip-10-0-96-138.us-east-2.compute.internal"

Will try and reproduce it on 0.19 and see if still an issue. Could be some of the dependabot updates fixed this.

vthapar added the bug Something isn't working label Feb 19, 2025

vthapar self-assigned this Feb 19, 2025

vthapar added this to Submariner 0.20 Feb 19, 2025

maayanf24 moved this to Todo in Submariner 0.20 Feb 19, 2025

vthapar changed the title ~~E2E: Redundancy tests failing on OCP 4.16+~~ E2E: Redundancy tests failing on OCP 4.17+ Feb 19, 2025

dfarrell07 added the priority:medium label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2E: Redundancy tests failing on OCP 4.17+ #3310

E2E: Redundancy tests failing on OCP 4.17+ #3310

vthapar commented Feb 19, 2025 •

edited

Loading

vthapar commented Feb 19, 2025

E2E: Redundancy tests failing on OCP 4.17+ #3310

E2E: Redundancy tests failing on OCP 4.17+ #3310

Comments

vthapar commented Feb 19, 2025 • edited Loading

vthapar commented Feb 19, 2025

vthapar commented Feb 19, 2025 •

edited

Loading