You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The text was updated successfully, but these errors were encountered:
zdyang
changed the title
Cannot get "nvlink_flit_crc_error_count_total(409)" and "nvlink_data_crc_error_count_total(419)" in Hopper HGX System
Cannot get "nvlink_flit_crc_error_count_total(409)" and "nvlink_data_crc_error_count_total(419)" in H800 System
May 22, 2024
I used dcgm 3.3.0 in a H100 system with nvlink and tried to collect nvlink related metrics, such as "nvlink_flit_crc_error_count_total"(field id 409) and "nvlink_data_crc_error_count_total" (field id 419). However, dcgm always returned N/A. nv-hostengine and dcgmi was run with root privilege. Hareware driver version was "535.129"
No description provided.
The text was updated successfully, but these errors were encountered: