-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VLAB random setup-vpcs errors when configuring server VMs #292
Comments
Another hit:
|
So the error happens during this line execution:
It would be helpful to capture the state (logs, outputs) of the VLAB just after the failure to be able to analyse the root cause. As @Frostman , to progress on this one, what's your preferred approach? Expand hhfab helpers or allow hhfab to (also) run "detached" (eg. a Service interface) and allow artifact gathering externally? |
I could reproduce this in env-3
I could see this in the server logs:
Around that time I see an error in the Agent of (s5248-05) and it looks like it restarts:
Does this look like any known issue @Frostman ? |
I just saw this exact same issue in env-1 while testing the VRF scaling, for what it's worth:
|
First hit with show-tech captured: |
|
I hit this on env-3:
I notice the link going down and no IP after recovering:
Investigating the upstream switch:
I see a series of logs when
And up:
I see other ports going down in the switch log:
@Frostman are you aware of any SONiC issue like this? or should we inspect the lab cabling/NICs. Local fault usually indicates loss of signal detected on the receive data path of a local port:
|
Thanks, @edipascale . I'm getting this consistently in env-3, so I'm trying to improve hhnet script while working on another PR |
OK. I took a look at the hhnet script and it does a
I refactored hhnet to use networkctl and the first impression is it's a lot more stable. I'm facing some other issues in env-3 to test this, at the moment |
There are known issues with the
hhnet
script: https://github.com/githedgehog/fabricator/blob/master/pkg/hhfab/hhnet.shBut as it's hitting the CI from time to time it needs to be addressed to make the CI more reliable and avoid having to retry the job:
https://github.com/githedgehog/fabricator/actions/runs/12530586269/job/34947281228
The text was updated successfully, but these errors were encountered: