Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate zero buffer profiles in 'temp' view after Warmboot while storm is ongoing. #1030

Open
svsivm opened this issue Apr 7, 2022 · 2 comments
Assignees

Comments

@svsivm
Copy link

svsivm commented Apr 7, 2022

Description:
Original issue (#899) addressed the problem in the comparison logic.
However, the root cause seems to be in OA 'temp' view construction logic for which this issue is being raised.

Please refer to the original issue where @kcudnik has already done significant analysis.

Two extra zero buffer profiles are created in the 'temp' asic view if warmboot is executed while some queues have zero buffer profiles attached to them. The VIDs of these two extra zero buffer profiles in temp asic view match those in the 'current' asic view. However the attribute list in the temp asic view is empty for these matching VIDs and hence the comparison logic during warmboot reconciliation ends up 'creating' 2 new zero buffer profiles although these profiles already exist on the ASIC. We ran into this issue while running the PFC WD warmboot pytest, specifically the second sub-test (https://github.com/Azure/sonic-mgmt/blob/master/tests/pfcwd/test_pfcwd_warm_reboot.py#L25)

Please let us know why the 2 zero buffer profiles are created again post warmboot? Is it by design?
These duplicate creates are causing problems subsequently in the testcase’s storm restoration path.

Steps to reproduce:
Execute the second scenario in the pfc watchdog warmboot test on platform that uses 'zero buffer profile' model to handle PFC storms.

To reproduce manually, perform the following steps:
(a) Enable PFC WD on all target port/queue.
(b) Send PFC storm to target port/queue and verify PFC storm is detected and mitigation action is executed.
(c) While PFC storm is continued to be sent, perform warmboot.
(d) Compare the temp view and current asic view for BUFFER_PROFILE key and you can see that there are 2 extra buffer profiles in the temp view.
(e) 2 zero buffer profiles are again 'created' by NOS.

Mar 30 02:47:42.655739 sonic-wistron3-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_BUFFER_PROFILE on current view 8 is different than on temporary view: 10

Mar 30 02:47:42.766455 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: create: SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e6. <<<<<<< DUPLICATE ZERO BUFFER PROFILE
Mar 30 02:47:42.766455 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE 0
Mar 30 02:47:42.766455 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_POOL_ID oid:0x1800000000050f
Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC
Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH -8
Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_QUEUE:oid:0x150000000001d5
Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_QUEUE_ATTR_BUFFER_PROFILE_ID oid:0x190000000005e6 (current: oid:0x19000000000510)
Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_QUEUE:oid:0x150000000001e5
Mar 30 02:47:42.766571 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_QUEUE_ATTR_BUFFER_PROFILE_ID oid:0x190000000005e6 (current: oid:0x19000000000510)
Mar 30 02:47:42.766623 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: create: SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e8 <<<<<<< DUPLICATE ZERO BUFFER PROFILE
Mar 30 02:47:42.766623 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE 0
Mar 30 02:47:42.766623 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_POOL_ID oid:0x18000000000511
Mar 30 02:47:42.766623 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC
Mar 30 02:47:42.766661 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH -8
Mar 30 02:47:42.766661 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP:oid:0x1a0000000000ae
Mar 30 02:47:42.766661 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_INGRESS_PRIORITY_GROUP_ATTR_BUFFER_PROFILE oid:0x190000000005e8 (current: oid:0x19000000000512)
Mar 30 02:47:42.766661 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP:oid:0x1a0000000000be
Mar 30 02:47:42.766694 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: - SAI_INGRESS_PRIORITY_GROUP_ATTR_BUFFER_PROFILE oid:0x190000000005e8 (current: oid:0x19000000000512)
Mar 30 02:47:42.767998 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: create: SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e6
Mar 30 02:47:42.767998 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_QUEUE:oid:0x150000000001d5
Mar 30 02:47:42.768036 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_QUEUE:oid:0x150000000001e5
Mar 30 02:47:42.768036 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: create: SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e8
Mar 30 02:47:42.768036 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP:oid:0x1a0000000000ae
Mar 30 02:47:42.768036 sonic-dut NOTICE syncd#syncd: :- executeOperationsOnAsic: set: SAI_OBJECT_TYPE_INGRESS_PRIORITY_GROUP:oid:0x1a0000000000be

BEFORE WARMBOOT:
root@sonic--dut:~# redis-cli -n 1
127.0.0.1:6379[1]>
127.0.0.1:6379[1]>
127.0.0.1:6379[1]>
127.0.0.1:6379[1]> keys BUFFER_PROFILE

  1. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c2"
  2. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c1"
  3. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000510" >>>>>>>> ZERO BUFFER PROFILE
  4. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003bf"
  5. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003be"
  6. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003bd"
  7. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c0"
  8. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000512" >>>>>>>>> ZERO BUFFER PROFILE
    127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000510
  9. "SAI_BUFFER_PROFILE_ATTR_POOL_ID"
  10. "oid:0x1800000000050f"
  11. "SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE"
  12. "SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC"
  13. "SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE"
  14. "0"
  15. "SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH"
  16. "-8"
    127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000051
    (empty array)
    127.0.0.1:6379[1]> hgetall ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000512
  17. "SAI_BUFFER_PROFILE_ATTR_POOL_ID"
  18. "oid:0x18000000000511"
  19. "SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE"
  20. "SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC"
  21. "SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE"
  22. "0"
  23. "SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH"
  24. "-8"

AFTER WARMBOOT
root@sonic--dut:~# redis-cli -n 1
127.0.0.1:6379[1]> keys BUFFER_PROFILE

  1. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c1"
  2. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c0"
  3. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003bd"
  4. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003c2"
  5. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053d"
  6. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003bf"
  7. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053f"
  8. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053c"
  9. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000510" >>>>>> Matching VID with current view, but empty attr list.
  10. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000003be"
  11. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e8">>>>>> Extra ‘zero buffer profiles’ with appropriate attribute values.
  12. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053e"
  13. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000510"
  14. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000512"
  15. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e6">>>>>> Extra ‘zero buffer profiles’ with appropriate attribute values.
  16. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053b"
  17. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000512" >>>>>> Matching VID with current view, but empty attr list.
  18. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000053a"
    127.0.0.1:6379[1]> hgetall TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000510
  19. "NULL"
  20. "NULL"
    127.0.0.1:6379[1]> hgetall TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x19000000000512
  21. "NULL"
  22. "NULL"
    127.0.0.1:6379[1]> hgetall TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e8
  23. "SAI_BUFFER_PROFILE_ATTR_POOL_ID"
  24. "oid:0x180000000005e7"
  25. "SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE"
  26. "SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC"
  27. "SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE"
  28. "0"
  29. "SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH"
  30. "-8"
    127.0.0.1:6379[1]> hgetall TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000005e6
  31. "SAI_BUFFER_PROFILE_ATTR_POOL_ID"
  32. "oid:0x180000000005e5"
  33. "SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE"
  34. "SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC"
  35. "SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE"
  36. "0"
  37. "SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH"
  38. "-8"
    pfcwd_warmboot.zip
@svsivm
Copy link
Author

svsivm commented Apr 7, 2022

I have attached the syslog and sairedis log before and after warmboot in pfc_warmboot.zip which is attached with the previous post.

@kcudnik kcudnik self-assigned this Apr 7, 2022
@kcudnik
Copy link
Collaborator

kcudnik commented Apr 9, 2022

ok I know where the issue problem is
Problem is when SAI_QUEUE_ATTR_BUFFER_PROFILE_ID is queried (GET operation) before APPLY_VIEW is issued
The problem is located in the scopes of current view and temporary view, we had this issue before, imagine this situation:
• On cold boot you create 1 one buffer profile (lets name it A), and set it to queue SAI_QUEUE_ATTR_BUFFER_PROFILE_ID
• Then you are doing warm boot, and issue init view
• Now you create 1 buffer profile (name it B) and set it on SAI_QUEUE_ATTR_BUFFER_PROFILE_ID, but this is build in temporary view, no asic operation is performed yet
• Now you query SAI_QUEUE_ATTR_BUFFER_PROFILE_ID on existing queue, this operation returns buffer profile A (even do you assigned buffer profile B) since no apply view was issued
• Now you issue apply_view command, and in temporary view you have 2 buffer buffer profiles (A and B) A because you queried it and that A OID was brought to temporary view and OA have knowledge of it (and it cannot be remove by syncd because it would violate OID consistency in OA), and B OID because you just created it in temporary view
• This query happens twice, on SAI_QUEUE_ATTR_BUFFER_PROFILE_ID and SAI_INGRESS_PRIORITY_GROUP_ATTR_BUFFER_PROFILE so it brings back 2 oids, and hance 10 buffer profiles instead of 8.
If you remove those 2 queries before APPLY_VIEW command then there would be no ASIC operations on buffer profile, and everything will work fine

It is not recommended to query attributes that you will eventually SET since it will lead to problems like this, and it can’t be easy solved, this needs to be addressed on OA logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants