-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SmartSwitch] Enhance PCIe device check to skip the warning log, if device is in detaching mode #546
Conversation
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
@prgeor please help review and merge, if no other changes needed. |
…c-net#535) * Added new dynamic field 'last_sync_time' that shows when STORAGE_INFO for disk was last synced to STATE_DB * Moved 'start' message to actual starting point of the daemon * Added functions for formatted and epoch time for user friendly time display * Made changes per prgeor review comments * Pivot to SysLogger for all logging * Increased log level so that they are seen in syslogs * Code coverage improvement
…net#542) When LC is absent for 30 minutes, the database cleanup kicks in. When LagId is released, it needs to be appended to the SYSTEM_LAG_IDS_FREE_LIST This PR works with the following 2 PRs: sonic-net/sonic-swss#3303 sonic-net/sonic-buildimage#20369 Signed-off-by: mlok <[email protected]>
…ATE_DB (sonic-net#560) Fixed the bug in chassisd due to which incorrect number of ASICs were being pushed to CHASSIS_STATE_DB.
* thermalctld: Add support for fans on non-CPU modules * Add module fan to unit tests
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
How Has This Been Tested? |
@vvolam buid failure |
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
@vvolam Not sure if it is supposed to be handled by the reboot feature implementation, but don't we need the same warning suppression if DPU is powered off as well? As the PCI id is removed even for powered off DPUs |
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
@gpunathilell Noted. As this PR is avoiding the warnings logs based on the PCI entry in STATE_DB. We need to add this STATE_DB entry for the DPUs which are in power-off state. I will raise a separate PR for addressing this. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vvolam check for admin
status in config DB and if it is not present or admin is down, then do not throw the warning error message for missing DPU
@prgeor I would like to do as a separate patch to avoid logs in case of DPU offline or admin down case as it requires a bit more careful patch to not overload PCIe daemon. Had a discussion with @gpunathilell yesterday and he too agreed on this. @gpunathilell Keeping the new case aside, could you review the existing changes for DPU detaching state? |
Description
Add a new function "is_device_in_detaching_mode" to query state_db for devices in detaching mode. Enhance "check_pcie_devices" to invoke this function to skip the warning log, if device is in detaching mode.
Motivation and Context
Changes are being done based on the reboot HLD
How Has This Been Tested?
Confirmed that no warning logs appear for the removed PCI interface after loading the PCIe daemon on the Smart Switch and intentionally removing the DPU PCI interface during a reboot.
Additional Information (Optional)