Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] CUDA error: uncorrectable ECC error encountered #3204

Closed
5 tasks done
HuanzhiMao opened this issue Jan 29, 2025 · 3 comments
Closed
5 tasks done

[Bug] CUDA error: uncorrectable ECC error encountered #3204

HuanzhiMao opened this issue Jan 29, 2025 · 3 comments
Assignees
Labels

Comments

@HuanzhiMao
Copy link
Contributor

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Max context length: 163840
2025-01-29 08:51:14.393720: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140674.412162  464568 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140674.417882  464568 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
[2025-01-29 08:51:25] server_args=ServerArgs(model_path='deepseek-ai/DeepSeek-V2.5', tokenizer_path='deepseek-ai/DeepSeek-V2.5', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='bfloat16', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='deepseek-ai/DeepSeek-V2.5', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='127.0.0.1', port=1053, mem_fraction_static=0.9, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=646462308, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140689.739817  464959 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140689.745387  464959 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140689.763624  464954 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140689.769267  464954 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140689.980293  464955 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140689.986039  464955 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.043146  464958 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.048954  464958 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.118109  464952 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.124083  464952 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.520316  464956 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.525952  464956 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.599351  464957 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.605063  464957 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140691.215711  464951 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140691.224214  464951 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140691.243279  464953 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140691.251734  464953 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
[2025-01-29 08:51:41 TP6] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP6] Init torch distributed begin.
[2025-01-29 08:51:41 TP4] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP4] Init torch distributed begin.
[2025-01-29 08:51:41 TP5] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP5] Init torch distributed begin.
[2025-01-29 08:51:41 TP1] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP1] Init torch distributed begin.
[2025-01-29 08:51:42 TP3] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:42 TP3] Init torch distributed begin.
[2025-01-29 08:51:43 TP0] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:43 TP0] Init torch distributed begin.
[2025-01-29 08:51:43 TP7] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:43 TP7] Init torch distributed begin.
[2025-01-29 08:51:44 TP2] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:44 TP2] Init torch distributed begin.
[2025-01-29 08:51:44 TP3] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP0] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP7] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP1] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP2] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP5] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP4] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP6] sglang is using nccl==2.21.5
[2025-01-29 08:51:51 TP5] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP1] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP0] Load weight begin. avail mem=77.14 GB
[2025-01-29 08:51:51 TP7] Load weight begin. avail mem=77.14 GB
[2025-01-29 08:51:51 TP3] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP4] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP6] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP2] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:52 TP0] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP5] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP4] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP1] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP6] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP7] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP3] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP2] Using model weights format ['*.safetensors']
Cache shape torch.Size([163840, 64])

Loading safetensors checkpoint shards:   0% Completed | 0/55 [00:00<?, ?it/s]

Loading safetensors checkpoint shards:   2% Completed | 1/55 [00:01<01:09,  1.28s/it]

Loading safetensors checkpoint shards:   4% Completed | 2/55 [00:02<01:00,  1.13s/it]

Loading safetensors checkpoint shards:   5% Completed | 3/55 [00:02<00:45,  1.13it/s]

Loading safetensors checkpoint shards:   7% Completed | 4/55 [00:05<01:12,  1.42s/it]

Loading safetensors checkpoint shards:   9% Completed | 5/55 [00:05<00:58,  1.18s/it]

Loading safetensors checkpoint shards:  11% Completed | 6/55 [00:06<00:50,  1.03s/it]

Loading safetensors checkpoint shards:  13% Completed | 7/55 [00:07<00:44,  1.07it/s]

Loading safetensors checkpoint shards:  15% Completed | 8/55 [00:08<00:41,  1.13it/s]

Loading safetensors checkpoint shards:  16% Completed | 9/55 [00:08<00:39,  1.18it/s]

Loading safetensors checkpoint shards:  18% Completed | 10/55 [00:09<00:37,  1.21it/s]

Loading safetensors checkpoint shards:  20% Completed | 11/55 [00:10<00:35,  1.24it/s]

Loading safetensors checkpoint shards:  22% Completed | 12/55 [00:11<00:34,  1.24it/s]

Loading safetensors checkpoint shards:  24% Completed | 13/55 [00:12<00:34,  1.23it/s]

Loading safetensors checkpoint shards:  25% Completed | 14/55 [00:12<00:33,  1.21it/s]

Loading safetensors checkpoint shards:  27% Completed | 15/55 [00:13<00:31,  1.27it/s]

Loading safetensors checkpoint shards:  29% Completed | 16/55 [00:14<00:30,  1.27it/s]

Loading safetensors checkpoint shards:  31% Completed | 17/55 [00:15<00:29,  1.27it/s]

Loading safetensors checkpoint shards:  33% Completed | 18/55 [00:16<00:29,  1.25it/s]

Loading safetensors checkpoint shards:  35% Completed | 19/55 [00:16<00:29,  1.24it/s]

Loading safetensors checkpoint shards:  36% Completed | 20/55 [00:17<00:28,  1.24it/s]

Loading safetensors checkpoint shards:  38% Completed | 21/55 [00:18<00:27,  1.24it/s]

Loading safetensors checkpoint shards:  40% Completed | 22/55 [00:19<00:26,  1.24it/s]

Loading safetensors checkpoint shards:  42% Completed | 23/55 [00:19<00:24,  1.29it/s]

Loading safetensors checkpoint shards:  44% Completed | 24/55 [00:20<00:22,  1.36it/s]

Loading safetensors checkpoint shards:  45% Completed | 25/55 [00:21<00:23,  1.28it/s]

Loading safetensors checkpoint shards:  47% Completed | 26/55 [00:22<00:23,  1.23it/s]

Loading safetensors checkpoint shards:  49% Completed | 27/55 [00:23<00:22,  1.22it/s]

Loading safetensors checkpoint shards:  51% Completed | 28/55 [00:24<00:22,  1.21it/s]

Loading safetensors checkpoint shards:  53% Completed | 29/55 [00:24<00:20,  1.24it/s]

Loading safetensors checkpoint shards:  55% Completed | 30/55 [00:26<00:25,  1.03s/it]

Loading safetensors checkpoint shards:  56% Completed | 31/55 [00:27<00:22,  1.09it/s]

Loading safetensors checkpoint shards:  58% Completed | 32/55 [00:27<00:19,  1.16it/s]

Loading safetensors checkpoint shards:  60% Completed | 33/55 [00:29<00:27,  1.24s/it]

Loading safetensors checkpoint shards:  62% Completed | 34/55 [00:30<00:23,  1.10s/it]

Loading safetensors checkpoint shards:  64% Completed | 35/55 [00:31<00:19,  1.02it/s]

Loading safetensors checkpoint shards:  65% Completed | 36/55 [00:32<00:17,  1.11it/s]

Loading safetensors checkpoint shards:  67% Completed | 37/55 [00:32<00:14,  1.21it/s]

Loading safetensors checkpoint shards:  69% Completed | 38/55 [00:33<00:13,  1.26it/s]

Loading safetensors checkpoint shards:  71% Completed | 39/55 [00:34<00:12,  1.30it/s]

Loading safetensors checkpoint shards:  73% Completed | 40/55 [00:34<00:11,  1.30it/s]

Loading safetensors checkpoint shards:  75% Completed | 41/55 [00:36<00:14,  1.02s/it]

Loading safetensors checkpoint shards:  76% Completed | 42/55 [00:37<00:12,  1.05it/s]

Loading safetensors checkpoint shards:  78% Completed | 43/55 [00:38<00:10,  1.12it/s]

Loading safetensors checkpoint shards:  80% Completed | 44/55 [00:38<00:09,  1.18it/s]

Loading safetensors checkpoint shards:  82% Completed | 45/55 [00:39<00:08,  1.21it/s]

Loading safetensors checkpoint shards:  84% Completed | 46/55 [00:40<00:07,  1.25it/s]

Loading safetensors checkpoint shards:  85% Completed | 47/55 [00:41<00:06,  1.26it/s]

Loading safetensors checkpoint shards:  87% Completed | 48/55 [00:41<00:05,  1.28it/s]

Loading safetensors checkpoint shards:  89% Completed | 49/55 [00:42<00:04,  1.30it/s]

Loading safetensors checkpoint shards:  91% Completed | 50/55 [00:44<00:04,  1.03it/s]

Loading safetensors checkpoint shards:  93% Completed | 51/55 [00:44<00:03,  1.08it/s]

Loading safetensors checkpoint shards:  95% Completed | 52/55 [00:45<00:02,  1.11it/s]

Loading safetensors checkpoint shards:  96% Completed | 53/55 [00:46<00:01,  1.14it/s]

Loading safetensors checkpoint shards:  98% Completed | 54/55 [00:47<00:00,  1.17it/s]
[2025-01-29 08:52:39 TP6] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:39 TP1] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:39 TP4] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:39 TP2] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB

Loading safetensors checkpoint shards: 100% Completed | 55/55 [00:48<00:00,  1.19it/s]

Loading safetensors checkpoint shards: 100% Completed | 55/55 [00:48<00:00,  1.14it/s]

[2025-01-29 08:52:40 TP3] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:40 TP0] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.71 GB
[2025-01-29 08:52:40 TP5] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:40 TP7] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.71 GB
[2025-01-29 08:52:41 TP6] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP4] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP3] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP5] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP0] Memory pool end. avail mem=6.69 GB
[2025-01-29 08:52:41 TP1] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP2] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP7] Memory pool end. avail mem=6.69 GB
[2025-01-29 08:52:41 TP0] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP6] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP4] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP3] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP1] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP7] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP0] Capture cuda graph begin. This can take up to several minutes.

  0%|          | 0/23 [00:00<?, ?it/s][2025-01-29 08:52:41 TP6] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP4] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP3] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP1] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP5] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP2] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP7] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP5] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP2] Capture cuda graph begin. This can take up to several minutes.
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
[2025-01-29 08:52:43 TP0] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:43 TP1] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:43 TP6] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:43 TP5] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP2] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP4] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP3] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP7] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json

  4%|▍         | 1/23 [00:07<02:39,  7.23s/it]
  9%|▊         | 2/23 [00:08<01:20,  3.82s/it][rank1]:[E129 08:52:51.685329145 ProcessGroupNCCL.cpp:1595] [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: uncorrectable ECC error encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:364 'uncorrectable ECC error encountered'
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x79ebd4f6c446 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x79ebd4f166e4 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x79ec210a5a18 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x79eb381c7726 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x79eb381cc3f0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x79eb381d3b5a in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x79eb381d561d in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x79ec23a1b5c0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x79ec2da94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x79ec2db26850 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: uncorrectable ECC error encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x79ebd4f6c446 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x79ebd4f166e4 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x79ec210a5a18 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x79eb381c7726 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x79eb381cc3f0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x79eb381d3b5a in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x79eb381d561d in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x79ec23a1b5c0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x79ec2da94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x79ec2db26850 in /lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1601 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x79ebd4f6c446 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xe4271b (0x79eb37e4271b in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x145c0 (0x79ec23a1b5c0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x94ac3 (0x79ec2da94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: <unknown function> + 0x126850 (0x79ec2db26850 in /lib/x86_64-linux-gnu/libc.so.6)

Fatal Python error: Aborted

Thread 0x000079e69b400640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000079e803c00640 (most recent call first):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 47 in _recv_msg
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 153 in _read_thread
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000079ec2dd59000 (most recent call first):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_ops.py", line 1116 in __call__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/_custom_ops.py", line 90 in get_graph_buffer_ipc_meta
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/device_communicators/custom_all_reduce.py", line 320 in register_graph_buffers
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/device_communicators/custom_all_reduce.py", line 317 in capture
  File "/usr/lib/python3.10/contextlib.py", line 153 in __exit__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 325 in graph_capture
  File "/usr/lib/python3.10/contextlib.py", line 153 in __exit__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 941 in graph_capture
  File "/usr/lib/python3.10/contextlib.py", line 153 in __exit__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 275 in capture
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 226 in __init__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 730 in init_cuda_graphs
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 214 in __init__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 68 in __init__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63 in __init__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 239 in __init__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1773 in run_scheduler_process
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 129 in _main
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116 in spawn_main
  File "<string>", line 1 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, _brotli, charset_normalizer.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, uvloop.loop, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_linux, psutil._psutil_posix, setproctitle, zmq.backend.cython._zmq, yaml._yaml, markupsafe._speedups, PIL._imaging, PIL._imagingft, google._upb._message, h5py._debian_h5py_serial._errors, h5py._debian_h5py_serial.defs, h5py._debian_h5py_serial._objects, h5py._debian_h5py_serial.h5, h5py._debian_h5py_serial.h5r, h5py._debian_h5py_serial.utils, h5py._debian_h5py_serial.h5s, h5py._debian_h5py_serial.h5ac, h5py._debian_h5py_serial.h5p, h5py._debian_h5py_serial.h5t, h5py._debian_h5py_serial._conv, h5py._debian_h5py_serial.h5z, h5py._debian_h5py_serial._proxy, h5py._debian_h5py_serial.h5a, h5py._debian_h5py_serial.h5d, h5py._debian_h5py_serial.h5ds, h5py._debian_h5py_serial.h5g, h5py._debian_h5py_serial.h5i, h5py._debian_h5py_serial.h5f, h5py._debian_h5py_serial.h5fd, h5py._debian_h5py_serial.h5pl, h5py._debian_h5py_serial.h5o, h5py._debian_h5py_serial.h5l, h5py._debian_h5py_serial._selector, h5py.atexit, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, scipy.sparse._csparsetools, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, jaxlib.cpu_feature_guard, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.__check_build._check_build, sklearn.utils.murmurhash, lz4._version, lz4.frame._frame, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg.cython_lapack, scipy.linalg._decomp_update, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.sparse.linalg._isolve._iterative, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap_module, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.special.cython_special, scipy.stats._stats, beta_ufunc, scipy.stats._boost.beta_ufunc, binom_ufunc, scipy.stats._boost.binom_ufunc, nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats._biasedurn, scipy.stats._hypotests_pythran, scipy.stats._statlib, scipy.stats._mvn, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._unuran.unuran_wrapper, sklearn.utils._openmp_helpers, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._pairwise_fast, msgspec._core, sentencepiece._sentencepiece, regex._regex, msgpack._cmsgpack, ray._raylet, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, xxhash._xxhash, pyarrow._json, pyarrow._acero, pyarrow._csv, pyarrow._substrait, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, cuda_utils, __triton_launcher (total: 262)
Image

Reproduction

deepseek-ai/DeepSeek-V2.5 on 8xA100 (80G), --tp 8

Environment

ubuntu@207-211-174-109:~/gorilla/berkeley-function-call-leaderboard$ python3 -m sglang.check_env
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2025-01-29 08:57:57.439587: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738141077.458842  481708 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738141077.464579  481708 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/home/ubuntu/.local/lib/python3.10/site-packages/pydantic/_internal/_config.py:345: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed
  warnings.warn(message, UserWarning)
Python: 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB
GPU 0,1,2,3,4,5,6,7 Compute Capability: 8.0
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.127.05
PyTorch: 2.5.1+cu124
sglang: 0.4.2
flashinfer: 0.1.6+cu121torch2.4
triton: 3.1.0
transformers: 4.48.1
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.7
hf_transfer: 0.1.9
huggingface_hub: 0.28.0
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.6
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.60.2
anthropic: 0.45.2
decord: 0.6.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV12    NV12    NV12    NV12    NV12    NV12    NV12    PHB     0-239   0-1             N/A
GPU1    NV12     X      NV12    NV12    NV12    NV12    NV12    NV12    PHB     0-239   0-1             N/A
GPU2    NV12    NV12     X      NV12    NV12    NV12    NV12    NV12    PHB     0-239   0-1             N/A
GPU3    NV12    NV12    NV12     X      NV12    NV12    NV12    NV12    PHB     0-239   0-1             N/A
GPU4    NV12    NV12    NV12    NV12     X      NV12    NV12    NV12    PHB     0-239   0-1             N/A
GPU5    NV12    NV12    NV12    NV12    NV12     X      NV12    NV12    PHB     0-239   0-1             N/A
GPU6    NV12    NV12    NV12    NV12    NV12    NV12     X      NV12    PHB     0-239   0-1             N/A
GPU7    NV12    NV12    NV12    NV12    NV12    NV12    NV12     X      PHB     0-239   0-1             N/A
NIC0    PHB     PHB     PHB     PHB     PHB     PHB     PHB     PHB      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0


Hypervisor vendor: KVM
ulimit soft: 1048576
ubuntu@207-211-174-109:~/gorilla/berkeley-function-call-leaderboard$ 
@zhaochenyang20
Copy link
Collaborator

cc @zhyncs

@RonanKMcGovern
Copy link

Yes, I'm perhaps getting a similar error on 1xH100 Distil Qwen 1.5B:


  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 189, in get_available_gpu_memory
    free_gpu_memory, _ = torch.cuda.mem_get_info(gpu_id)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/memory.py", line 712, in mem_get_info
    return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: uncorrectable ECC error encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-02-03 01:25:50] Received sigquit from a child proces. It usually means the child failed.

running:

python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --context-length 32000 --host 0.0.0.0 --port 8000 --trust-remote-code --quantization fp8 --kv-cache-dtype fp8_e5m2

using:

lmsysorg/sglang:latest

However, adding --disable-cuda-graph solves the issue.

@zhaochenyang20
Copy link
Collaborator

@simveit Hey simon. Could we add this in our docs for serving args?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants