-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart syncd fails in "applyView" with condition current view switches: 0 != temporary view switches #639
Comments
applyView: current view switches: 0 != temporary view switches: 1, FATAL |
something is wrong here: Feb 14 10:23:37.043951 sonic NOTICE syncd#syncd: :- onSwitchCreateInInitViewMode: new switch oid:0x21000000000000 contains hardware info: '' // this indicates that switch is already present in syncd memory, which is ok Feb 14 10:23:37.450999 sonic NOTICE syncd#syncd: :- dump: getting took 0.000143 sec Feb 14 10:23:37.451342 sonic NOTICE syncd#syncd: :- getAsicView: get asic view from ASIC_STATE took 0.000990 sec |
after reaching this state, could you check "redis-cli -n 1" and "keys *" to see if there are any ASIC_STATE keys except TEMP_ASIC_STATE ? |
Steps to reproduce:Using ET-6448M board comprising ARMADA-385 CPU, with 48 1G
|
Attached redis-cli output in issue state. |
in ASIC.txt there is no ASIC_STATE* entries, this seems like those entreis were removed or database was cleared, could we work together on that this week? |
Yes. We can work on this. |
Since ASIC.txt dump don't contain "HIDDEN" and any ASIC_STATE* keys, then this means database was flushed, if so, then this means some script probably did FLUSHALL on database, most likely outside syncd/sairedis/OA |
"keys RIDTOVID" has some 116 entries. Additionally, I collected the "redis-cli monitor" logs. And syncd logs with "DEBUG" level enabled. This shows that lua scripts loaded in syncd context as part of docker restart, cleans-up the ASIC state. Hope these logs are helpful. Note that in the above log file, syncd is crashed. This is because of 32-bit arch, which we will fix in parallel. |
Hey, thanks for those, logs, this explains the issue, take a look here: will try investigate this internally this is related to sonic-buildimage/files/scripts/swss.sh execution, seems like this script is executed when OA and syncd is already rinning |
could you try to execute "sudo systemctl restart swss" or "config reload" instead of "syncd" ?, i think currently restart syncd is not supported. |
Thanks for the update. I see this issue only with restart_syncd, all other cases (reboot,config reload, restart swss ) are fine. |
i dont know why restart_syncd is not supported, maybe @yxieca can help here |
restart syncd should be the same as restarting swss. which could trigger some dependent service failure. Issue is not really with swss or syncd. at this moment, config reload is the safest way to recover. With that said, I think we should try to make restarting syncd/swss work. |
@kcudnik @lguohan : Pls check.
All dockers are fully up and then issued "sudo systemctl restart syncd", it fails in apply_view which
causes orchagent crash. Attached full logs for reference.
sonic-buildimage commit : f31eabb5ee65f7d37d57d0da85dacf39d3b5fad1
sonic-buildimage/src/sonic-sairedis commit 322dd01
syslog_FATAL_error.txt
Feb 14 10:23:37.448931 sonic NOTICE swss#orchagent: :- syncd_apply_view: Notify syncd APPLY_VIEW
Feb 14 10:23:37.448931 sonic NOTICE swss#orchagent: :- notifySyncd: sending syncd: APPLY_VIEW
Feb 14 10:23:37.450194 sonic WARNING syncd#syncd: :- processNotifySyncd: syncd received APPLY VIEW, will translate
Feb 14 10:23:37.450999 sonic NOTICE syncd#syncd: :- dump: getting took 0.000143 sec
Feb 14 10:23:37.451152 sonic NOTICE syncd#syncd: :- getAsicView: ASIC_STATE switch count: 0:
Feb 14 10:23:37.451342 sonic NOTICE syncd#syncd: :- getAsicView: get asic view from ASIC_STATE took 0.000990 sec
Feb 14 10:23:37.454001 sonic NOTICE syncd#syncd: :- dump: getting took 0.002015 sec
Feb 14 10:23:37.454562 sonic NOTICE syncd#syncd: :- getAsicView: TEMP_ASIC_STATE switch count: 1:
Feb 14 10:23:37.454699 sonic NOTICE syncd#syncd: :- getAsicView: oid:0x21000000000000: objects count: 64
Feb 14 10:23:37.455059 sonic NOTICE syncd#syncd: :- getAsicView: get asic view from TEMP_ASIC_STATE took 0.003421 sec
Feb 14 10:23:37.455239 sonic ERR syncd#syncd: :- applyView: current view switches: 0 != temporary view switches: 1, FATAL
Feb 14 10:23:37.475524 sonic NOTICE syncd#syncd: :- applyView: apply took 0.025017 sec
Feb 14 10:23:37.477195 sonic ERR syncd#syncd: :- run: Runtime error: :- applyView: current view switches: 0 != temporary view switches: 1, FATAL
Feb 14 10:23:37.477339 sonic NOTICE syncd#syncd: :- sendShutdownRequest: sending switch_shutdown_request notification to OA for switch: oid:0x21000000000000
Feb 14 10:23:37.485238 sonic NOTICE syncd#syncd: :- sendShutdownRequestAfterException: notification send successfull
Feb 14 10:23:37.505620 sonic INFO swss#supervisord: orchagent terminate called after throwing an instance of 'std::invalid_argument'
Feb 14 10:23:37.506549 sonic INFO swss#supervisord: orchagent what(): parse error - unexpected 'o'
=============================================================================
sroot@sonic:/home/admin# show version
SONiC Software Version: SONiC.jun9.0-dirty-20200706.194346
Distribution: Debian 10.4
Kernel: 4.19.0-6-armmp
Build commit: f31eabb5
Build date: Mon Jul 6 14:54:32 UTC 2020
Built by: marvell@cpss-build7
Platform: armhf-marvell_et6448m_52x-r0
HwSKU: et6448m
ASIC: marvell
Serial Number: 201908150001
Uptime: 11:38:48 up 1:27, 1 user, load average: 0.51, 0.56, 0.52
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-orchagent jun9.0-dirty-20200706.194346 9af6ab3099fe 352MB
docker-orchagent latest 9af6ab3099fe 352MB
docker-fpm-frr jun9.0-dirty-20200706.194346 1dedd2aa317c 300MB
docker-fpm-frr latest 1dedd2aa317c 300MB
docker-sflow jun9.0-dirty-20200706.194346 b6fb327d6add 286MB
docker-sflow latest b6fb327d6add 286MB
docker-nat jun9.0-dirty-20200706.194346 696e475b4d4c 287MB
docker-nat latest 696e475b4d4c 287MB
docker-teamd jun9.0-dirty-20200706.194346 c203e683b9a5 286MB
docker-teamd latest c203e683b9a5 286MB
docker-syncd-mrvl jun9.0-dirty-20200706.194346 a15ba5251617 531MB
docker-syncd-mrvl latest a15ba5251617 531MB
docker-snmp-sv2 jun9.0-dirty-20200706.194346 1967491bf338 348MB
docker-snmp-sv2 latest 1967491bf338 348MB
docker-lldp jun9.0-dirty-20200706.194346 f49ecd7684d8 328MB
docker-lldp latest f49ecd7684d8 328MB
docker-database jun9.0-dirty-20200706.194346 c85327cc927e 313MB
docker-database latest c85327cc927e 313MB
docker-platform-monitor jun9.0-dirty-20200706.194346 9752ae58bea8 322MB
docker-platform-monitor latest 9752ae58bea8 322MB
docker-router-advertiser jun9.0-dirty-20200706.194346 be949c4f52b2 270MB
docker-router-advertiser latest be949c4f52b2 270MB
docker-dhcp-relay jun9.0-dirty-20200706.194346 a05bf6ab2d92 277MB
docker-dhcp-relay latest a05bf6ab2d92 277MB
The text was updated successfully, but these errors were encountered: