-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HLD for Memory_Statistics #1760
base: master
Are you sure you want to change the base?
Conversation
reviewer is expected and please leave your comments if you want to be a reviewer. Thanks. |
community review recording https://zoom.us/rec/share/DtubGviNDO1Rt8VRnHsqWpnF17tu62g_CSP8aTY-hjIZct50ChlNJVWsjBqLaYIl.eYe4gvza4Lsqjhs3 |
Metric Current High Low D1-D3 D3-D5 D5-D7 D7-D9 D9-D11 D11-D13 D13-D15 | ||
Value Value Value 01Jun24 03Jun24 05Jun24 07Jun24 09Jun24 11Jun24 13Jun24 | ||
-------------- -------- -------- -------- --------- --------- --------- --------- --------- --------- --------- | ||
Total Memory 15.6G 15.6G 15.1G 15.1G 15.2G 15.3G 15.3G 15.4G 15.5G 15.5G | ||
Used Memory 2.3G 2.5G 2.0G 2.1G 2.2G 2.2G 2.3G 2.4G 2.3G 2.2G | ||
Free Memory 11.9G 12.4G 10.6G 11.0G 11.2G 11.4G 11.5G 11.7G 11.8G 11.9G | ||
Available Memory 13.0G 14.3G 12.4G 12.6G 12.7G 12.8G 12.9G 13.0G 13.1G 13.2G | ||
Cached Memory 1.2G 1.5G 1.0G 1.1G 1.2G 1.3G 1.4G 1.3G 1.4G 1.2G | ||
Buffer Memory 0.3G 0.4G 0.2G 0.2G 0.3G 0.3G 0.4G 0.3G 0.3G 0.4G | ||
Shared Memory 0.5G 0.6G 0.4G 0.5G 0.5G 0.5G 0.4G 0.5G 0.5G 0.5G | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Arham-Nasirare these stats collected from the output of free
linux command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data is not directly collected from the output of the free Linux command. Instead, they are gathered using the psutil Python library, which directly accesses system memory information via the same underlying mechanisms that the free command relies on. Specifically, the psutil.virtual_memory() function retrieves memory stats such as total, used, free, buffers, cached, and available memory.This approach allows for more direct access to memory data, avoiding the need to parse command output while providing flexibility for programmatic use.
Analysis Period: From 2024-06-01 09:00:00 to 2024-06-15 09:00:00 | ||
Interval: 2 days | ||
-------------------------------------------------------------------------------- | ||
Metric Current High Low D1-D3 D3-D5 D5-D7 D7-D9 D9-D11 D11-D13 D13-D15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sonic-gnmi (telemetry) already has a comprehensive implementation https://github.com/sonic-net/sonic-gnmi/blob/64ed32b715e1673c3376701ee192d6292f5ec0ec/sonic_data_client/non_db_client.go#L235. Please check and ensure matrix are aligned.
Even better, suggest reuse:
- delegate telemetry container as the memory collector
- reduce this HLD scope to only memory-stats and storage, connecting to telemetry container by gnmi protocol
- ensure this design could run inside and outside sonic switch
- ensure this feature could be completely disabled inside a sonic switch and no performance burden if disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Thank you for the suggestion. We will review the sonic-gnmi implementation to align our metrics with the existing telemetry framework for system consistency. We will also explore reusing the telemetry container for memory collection and narrow the HLD scope to focus on memory statistics and storage.
- Currently, we use the psutil.virtual_memory() function from the psutil library to gather memory data, which is similar to the getProcMeminfo() function in non_db_client.go, as both methods pull information from /proc/meminfo. We believe that psutil offers a more efficient and Pythonic approach compared to relying solely on the telemetry container for this purpose.
- Can you elaborate more on how the design could run inside and outside the sonic switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In enable/disable case, do you still use SIGHUP which means memstatsd will keep running? or just SIGTERM the memoryStats daemon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hostcfgd will detect the enable/disable flag in ConfigDB. For disable, it sends SIGTERM to stop the daemon, where the daemon gracefully stops all processes, releases resources, and completely shuts down. For enable, it restarts the daemon using systemd, and SIGHUP is used for configuration reloads while the daemon is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, better to mention it explicitly, memorystats is not daemon but process controlled by hostcfgd, what if hostcfgd crashes, will it reload memorystats and recalculate retention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a misunderstanding. Yes, we are introducing memorystatsd as a process and not a separate container. If the hostcfgd crashes, memorystats will restart and when the user enables memory statistics feature, it will use the default retention time unless the user explicitly mentions a custom retention period. Lets say, user enters 5 as retention period, then it will start the retention period from that time and will show the last 5 days data.
Add dynamic config reload and graceful shutdown for memorystatsd
@qiluo-msft @xincunli-sonic @zbud-msft @FengPan-Frank We have linked the code PRs. Please help review the feature. |
Signed-off-by: Arham-Nasir <[email protected]>
@qiluo-msft @xincunli-sonic @zbud-msft @FengPan-Frank We have linked the code PRs. Please help review the feature. |
@qiluo-msft @xincunli-sonic @FengPan-Frank Please help review the feature. |
@FengPan-Frank please help review and approve this feature. Code PRs are already in review. Target is 202411. We are running out of time for HLD review. Please help expedite review. |
This High-Level Design (HLD) document explains the Memory Statistics feature in SONiC. The aim is to improve memory monitoring and management to optimize network performance and reliability.
The Linked PR are as follows: