Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SmartSwitch] Enhance PCIe device check to skip the warning log, if device is in detaching mode #546

Merged
merged 23 commits into from
Feb 6, 2025

Conversation

vvolam
Copy link
Contributor

@vvolam vvolam commented Oct 5, 2024

Description

Add a new function "is_device_in_detaching_mode" to query state_db for devices in detaching mode. Enhance "check_pcie_devices" to invoke this function to skip the warning log, if device is in detaching mode.

Motivation and Context

Changes are being done based on the reboot HLD

How Has This Been Tested?

Confirmed that no warning logs appear for the removed PCI interface after loading the PCIe daemon on the Smart Switch and intentionally removing the DPU PCI interface during a reboot.

Additional Information (Optional)

@vvolam vvolam changed the title Enhance PCIe device check to skip the warning log, if device is in detaching mode [SmartSwitch] Enhance PCIe device check to skip the warning log, if device is in detaching mode Nov 11, 2024
@vvolam vvolam marked this pull request as ready for review November 13, 2024 03:24
sonic-pcied/scripts/pcied Outdated Show resolved Hide resolved
@vvolam
Copy link
Contributor Author

vvolam commented Nov 18, 2024

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vvolam vvolam requested a review from prgeor November 19, 2024 00:12
sonic-pcied/scripts/pcied Outdated Show resolved Hide resolved
@vvolam
Copy link
Contributor Author

vvolam commented Dec 12, 2024

@prgeor please help review and merge, if no other changes needed.

oleksandrivantsiv and others added 6 commits January 3, 2025 16:43
…c-net#535)

* Added new dynamic field 'last_sync_time' that shows when STORAGE_INFO for disk was last synced to STATE_DB

* Moved 'start' message to actual starting point of the daemon

* Added functions for formatted and epoch time for user friendly time display

* Made changes per prgeor review comments

* Pivot to SysLogger for all logging

* Increased log level so that they are seen in syslogs

* Code coverage improvement
…net#542)

When LC is absent for 30 minutes, the database cleanup kicks in. When LagId is released, it needs to be appended to the SYSTEM_LAG_IDS_FREE_LIST

This PR works with the following 2 PRs:
sonic-net/sonic-swss#3303
sonic-net/sonic-buildimage#20369

Signed-off-by: mlok <[email protected]>
…ATE_DB (sonic-net#560)

Fixed the bug in chassisd due to which incorrect number of ASICs were being pushed to CHASSIS_STATE_DB.
* thermalctld: Add support for fans on non-CPU modules

* Add module fan to unit tests
sonic-chassisd/scripts/chassisd Outdated Show resolved Hide resolved
sonic-xcvrd/xcvrd/sff_mgr.py Outdated Show resolved Hide resolved
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

prgeor
prgeor previously approved these changes Jan 17, 2025
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prgeor
Copy link
Collaborator

prgeor commented Jan 22, 2025

@vvolam

How Has This Been Tested?
Pending

@vvolam
Copy link
Contributor Author

vvolam commented Jan 22, 2025

@vvolam

How Has This Been Tested? Pending

@prgeor Updated the description. Thank you

@prgeor
Copy link
Collaborator

prgeor commented Jan 27, 2025

@vvolam buid failure

@vvolam
Copy link
Contributor Author

vvolam commented Jan 28, 2025

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vvolam
Copy link
Contributor Author

vvolam commented Jan 28, 2025

@vvolam buid failure

@prgeor It was due to agents unavailability. requesting build again.

@gpunathilell
Copy link
Contributor

@vvolam Not sure if it is supposed to be handled by the reboot feature implementation, but don't we need the same warning suppression if DPU is powered off as well? As the PCI id is removed even for powered off DPUs

@vvolam
Copy link
Contributor Author

vvolam commented Jan 29, 2025

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vvolam
Copy link
Contributor Author

vvolam commented Jan 29, 2025

@vvolam Not sure if it is supposed to be handled by the reboot feature implementation, but don't we need the same warning suppression if DPU is powered off as well? As the PCI id is removed even for powered off DPUs

@gpunathilell Noted. As this PR is avoiding the warnings logs based on the PCI entry in STATE_DB. We need to add this STATE_DB entry for the DPUs which are in power-off state. I will raise a separate PR for addressing this. Thank you!

Copy link
Collaborator

@prgeor prgeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvolam check for admin status in config DB and if it is not present or admin is down, then do not throw the warning error message for missing DPU

@vvolam
Copy link
Contributor Author

vvolam commented Feb 4, 2025

@vvolam check for admin status in config DB and if it is not present or admin is down, then do not throw the warning error message for missing DPU

@prgeor I would like to do as a separate patch to avoid logs in case of DPU offline or admin down case as it requires a bit more careful patch to not overload PCIe daemon. Had a discussion with @gpunathilell yesterday and he too agreed on this.

@gpunathilell Keeping the new case aside, could you review the existing changes for DPU detaching state?

@prgeor prgeor merged commit 7a0813a into sonic-net:master Feb 6, 2025
5 checks passed
@vvolam vvolam deleted the ss-reboot branch February 6, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.