[Bug] CUDA error: uncorrectable ECC error encountered #3204

HuanzhiMao · 2025-01-29T08:59:43Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Max context length: 163840
2025-01-29 08:51:14.393720: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140674.412162  464568 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140674.417882  464568 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
[2025-01-29 08:51:25] server_args=ServerArgs(model_path='deepseek-ai/DeepSeek-V2.5', tokenizer_path='deepseek-ai/DeepSeek-V2.5', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='bfloat16', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='deepseek-ai/DeepSeek-V2.5', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='127.0.0.1', port=1053, mem_fraction_static=0.9, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=646462308, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140689.739817  464959 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140689.745387  464959 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140689.763624  464954 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140689.769267  464954 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140689.980293  464955 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140689.986039  464955 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.043146  464958 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.048954  464958 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.118109  464952 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.124083  464952 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.520316  464956 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.525952  464956 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.599351  464957 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.605063  464957 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140691.215711  464951 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140691.224214  464951 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140691.243279  464953 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140691.251734  464953 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
[2025-01-29 08:51:41 TP6] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP6] Init torch distributed begin.
[2025-01-29 08:51:41 TP4] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP4] Init torch distributed begin.
[2025-01-29 08:51:41 TP5] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP5] Init torch distributed begin.
[2025-01-29 08:51:41 TP1] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP1] Init torch distributed begin.
[2025-01-29 08:51:42 TP3] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:42 TP3] Init torch distributed begin.
[2025-01-29 08:51:43 TP0] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:43 TP0] Init torch distributed begin.
[2025-01-29 08:51:43 TP7] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:43 TP7] Init torch distributed begin.
[2025-01-29 08:51:44 TP2] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:44 TP2] Init torch distributed begin.
[2025-01-29 08:51:44 TP3] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP0] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP7] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP1] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP2] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP5] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP4] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP6] sglang is using nccl==2.21.5
[2025-01-29 08:51:51 TP5] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP1] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP0] Load weight begin. avail mem=77.14 GB
[2025-01-29 08:51:51 TP7] Load weight begin. avail mem=77.14 GB
[2025-01-29 08:51:51 TP3] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP4] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP6] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP2] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:52 TP0] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP5] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP4] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP1] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP6] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP7] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP3] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP2] Using model weights format ['*.safetensors']
Cache shape torch.Size([163840, 64])

Loading safetensors checkpoint shards:   0% Completed | 0/55 [00:00<?, ?it/s]

Loading safetensors checkpoint shards:   2% Completed | 1/55 [00:01<01:09,  1.28s/it]

Loading safetensors checkpoint shards:   4% Completed | 2/55 [00:02<01:00,  1.13s/it]

Loading safetensors checkpoint shards:   5% Completed | 3/55 [00:02<00:45,  1.13it/s]

Loading safetensors checkpoint shards:   7% Completed | 4/55 [00:05<01:12,  1.42s/it]

Loading safetensors checkpoint shards:   9% Completed | 5/55 [00:05<00:58,  1.18s/it]

Loading safetensors checkpoint shards:  11% Completed | 6/55 [00:06<00:50,  1.03s/it]

Loading safetensors checkpoint shards:  13% Completed | 7/55 [00:07<00:44,  1.07it/s]

Loading safetensors checkpoint shards:  15% Completed | 8/55 [00:08<00:41,  1.13it/s]

Loading safetensors checkpoint shards:  16% Completed | 9/55 [00:08<00:39,  1.18it/s]

Loading safetensors checkpoint shards:  18% Completed | 10/55 [00:09<00:37,  1.21it/s]

Loading safetensors checkpoint shards:  20% Completed | 11/55 [00:10<00:35,  1.24it/s]

Loading safetensors checkpoint shards:  22% Completed | 12/55 [00:11<00:34,  1.24it/s]

Loading safetensors checkpoint shards:  24% Completed | 13/55 [00:12<00:34,  1.23it/s]

Loading safetensors checkpoint shards:  25% Completed | 14/55 [00:12<00:33,  1.21it/s]

Loading safetensors checkpoint shards:  27% Completed | 15/55 [00:13<00:31,  1.27it/s]

Loading safetensors checkpoint shards:  29% Completed | 16/55 [00:14<00:30,  1.27it/s]

Loading safetensors checkpoint shards:  31% Completed | 17/55 [00:15<00:29,  1.27it/s]

Loading safetensors checkpoint shards:  33% Completed | 18/55 [00:16<00:29,  1.25it/s]

Loading safetensors checkpoint shards:  35% Completed | 19/55 [00:16<00:29,  1.24it/s]

Loading safetensors checkpoint shards:  36% Completed | 20/55 [00:17<00:28,  1.24it/s]

Loading safetensors checkpoint shards:  38% Completed | 21/55 [00:18<00:27,  1.24it/s]

Loading safetensors checkpoint shards:  40% Completed | 22/55 [00:19<00:26,  1.24it/s]

Loading safetensors checkpoint shards:  42% Completed | 23/55 [00:19<00:24,  1.29it/s]

Loading safetensors checkpoint shards:  44% Completed | 24/55 [00:20<00:22,  1.36it/s]

Loading safetensors checkpoint shards:  45% Completed | 25/55 [00:21<00:23,  1.28it/s]

Loading safetensors checkpoint shards:  47% Completed | 26/55 [00:22<00:23,  1.23it/s]

Loading safetensors checkpoint shards:  49% Completed | 27/55 [00:23<00:22,  1.22it/s]

Loading safetensors checkpoint shards:  51% Completed | 28/55 [00:24<00:22,  1.21it/s]

Loading safetensors checkpoint shards:  53% Completed | 29/55 [00:24<00:20,  1.24it/s]

Loading safetensors checkpoint shards:  55% Completed | 30/55 [00:26<00:25,  1.03s/it]

Loading safetensors checkpoint shards:  56% Completed | 31/55 [00:27<00:22,  1.09it/s]

Loading safetensors checkpoint shards:  58% Completed | 32/55 [00:27<00:19,  1.16it/s]

Loading safetensors checkpoint shards:  60% Completed | 33/55 [00:29<00:27,  1.24s/it]

Loading safetensors checkpoint shards:  62% Completed | 34/55 [00:30<00:23,  1.10s/it]

Loading safetensors checkpoint shards:  64% Completed | 35/55 [00:31<00:19,  1.02it/s]

Loading safetensors checkpoint shards:  65% Completed | 36/55 [00:32<00:17,  1.11it/s]

Loading safetensors checkpoint shards:  67% Completed | 37/55 [00:32<00:14,  1.21it/s]

Loading safetensors checkpoint shards:  69% Completed | 38/55 [00:33<00:13,  1.26it/s]

Loading safetensors checkpoint shards:  71% Completed | 39/55 [00:34<00:12,  1.30it/s]

Loading safetensors checkpoint shards:  73% Completed | 40/55 [00:34<00:11,  1.30it/s]

Loading safetensors checkpoint shards:  75% Completed | 41/55 [00:36<00:14,  1.02s/it]

Loading safetensors checkpoint shards:  76% Completed | 42/55 [00:37<00:12,  1.05it/s]

Loading safetensors checkpoint shards:  78% Completed | 43/55 [00:38<00:10,  1.12it/s]

Loading safetensors checkpoint shards:  80% Completed | 44/55 [00:38<00:09,  1.18it/s]

Loading safetensors checkpoint shards:  82% Completed | 45/55 [00:39<00:08,  1.21it/s]

Loading safetensors checkpoint shards:  84% Completed | 46/55 [00:40<00:07,  1.25it/s]

Loading safetensors checkpoint shards:  85% Completed | 47/55 [00:41<00:06,  1.26it/s]

Loading safetensors checkpoint shards:  87% Completed | 48/55 [00:41<00:05,  1.28it/s]

Loading safetensors checkpoint shards:  89% Completed | 49/55 [00:42<00:04,  1.30it/s]

Loading safetensors checkpoint shards:  91% Completed | 50/55 [00:44<00:04,  1.03it/s]

Loading safetensors checkpoint shards:  93% Completed | 51/55 [00:44<00:03,  1.08it/s]

Loading safetensors checkpoint shards:  95% Completed | 52/55 [00:45<00:02,  1.11it/s]

Loading safetensors checkpoint shards:  96% Completed | 53/55 [00:46<00:01,  1.14it/s]

Loading safetensors checkpoint shards:  98% Completed | 54/55 [00:47<00:00,  1.17it/s]
[2025-01-29 08:52:39 TP6] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:39 TP1] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:39 TP4] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:39 TP2] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB

Loading safetensors checkpoint shards: 100% Completed | 55/55 [00:48<00:00,  1.19it/s]

Loading safetensors checkpoint shards: 100% Completed | 55/55 [00:48<00:00,  1.14it/s]

[2025-01-29 08:52:40 TP3] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:40 TP0] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.71 GB
[2025-01-29 08:52:40 TP5] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:40 TP7] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.71 GB
[2025-01-29 08:52:41 TP6] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP4] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP3] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP5] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP0] Memory pool end. avail mem=6.69 GB
[2025-01-29 08:52:41 TP1] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP2] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP7] Memory pool end. avail mem=6.69 GB
[2025-01-29 08:52:41 TP0] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP6] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP4] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP3] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP1] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP7] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP0] Capture cuda graph begin. This can take up to several minutes.

  0%|          | 0/23 [00:00<?, ?it/s][2025-01-29 08:52:41 TP6] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP4] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP3] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP1] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP5] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP2] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP7] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP5] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP2] Capture cuda graph begin. This can take up to several minutes.
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
[2025-01-29 08:52:43 TP0] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:43 TP1] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:43 TP6] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:43 TP5] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP2] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP4] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP3] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP7] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json

  4%|▍         | 1/23 [00:07<02:39,  7.23s/it]
  9%|▊         | 2/23 [00:08<01:20,  3.82s/it][rank1]:[E129 08:52:51.685329145 ProcessGroupNCCL.cpp:1595] [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: uncorrectable ECC error encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:364 'uncorrectable ECC error encountered'
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x79ebd4f6c446 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x79ebd4f166e4 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x79ec210a5a18 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x79eb381c7726 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x79eb381cc3f0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x79eb381d3b5a in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x79eb381d561d in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x79ec23a1b5c0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x79ec2da94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x79ec2db26850 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
  what():  [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: uncorrectable ECC error encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x79ebd4f6c446 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x79ebd4f166e4 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x79ec210a5a18 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x79eb381c7726 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x79eb381cc3f0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x79eb381d3b5a in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x79eb381d561d in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x79ec23a1b5c0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x79ec2da94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x79ec2db26850 in /lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1601 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x79ebd4f6c446 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xe4271b (0x79eb37e4271b in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x145c0 (0x79ec23a1b5c0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x94ac3 (0x79ec2da94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: <unknown function> + 0x126850 (0x79ec2db26850 in /lib/x86_64-linux-gnu/libc.so.6)

Fatal Python error: Aborted

Thread 0x000079e69b400640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000079e803c00640 (most recent call first):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 47 in _recv_msg
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 153 in _read_thread
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000079ec2dd59000 (most recent call first):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_ops.py", line 1116 in __call__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/_custom_ops.py", line 90 in get_graph_buffer_ipc_meta
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/device_communicators/custom_all_reduce.py", line 320 in register_graph_buffers
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/device_communicators/custom_all_reduce.py", line 317 in capture
  File "/usr/lib/python3.10/contextlib.py", line 153 in __exit__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 325 in graph_capture
  File "/usr/lib/python3.10/contextlib.py", line 153 in __exit__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 941 in graph_capture
  File "/usr/lib/python3.10/contextlib.py", line 153 in __exit__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 275 in capture
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 226 in __init__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 730 in init_cuda_graphs
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 214 in __init__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 68 in __init__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63 in __init__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 239 in __init__
  File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1773 in run_scheduler_process
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 129 in _main
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116 in spawn_main
  File "<string>", line 1 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, _brotli, charset_normalizer.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, uvloop.loop, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_linux, psutil._psutil_posix, setproctitle, zmq.backend.cython._zmq, yaml._yaml, markupsafe._speedups, PIL._imaging, PIL._imagingft, google._upb._message, h5py._debian_h5py_serial._errors, h5py._debian_h5py_serial.defs, h5py._debian_h5py_serial._objects, h5py._debian_h5py_serial.h5, h5py._debian_h5py_serial.h5r, h5py._debian_h5py_serial.utils, h5py._debian_h5py_serial.h5s, h5py._debian_h5py_serial.h5ac, h5py._debian_h5py_serial.h5p, h5py._debian_h5py_serial.h5t, h5py._debian_h5py_serial._conv, h5py._debian_h5py_serial.h5z, h5py._debian_h5py_serial._proxy, h5py._debian_h5py_serial.h5a, h5py._debian_h5py_serial.h5d, h5py._debian_h5py_serial.h5ds, h5py._debian_h5py_serial.h5g, h5py._debian_h5py_serial.h5i, h5py._debian_h5py_serial.h5f, h5py._debian_h5py_serial.h5fd, h5py._debian_h5py_serial.h5pl, h5py._debian_h5py_serial.h5o, h5py._debian_h5py_serial.h5l, h5py._debian_h5py_serial._selector, h5py.atexit, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, scipy.sparse._csparsetools, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, jaxlib.cpu_feature_guard, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.__check_build._check_build, sklearn.utils.murmurhash, lz4._version, lz4.frame._frame, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg.cython_lapack, scipy.linalg._decomp_update, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.sparse.linalg._isolve._iterative, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap_module, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.special.cython_special, scipy.stats._stats, beta_ufunc, scipy.stats._boost.beta_ufunc, binom_ufunc, scipy.stats._boost.binom_ufunc, nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats._biasedurn, scipy.stats._hypotests_pythran, scipy.stats._statlib, scipy.stats._mvn, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._unuran.unuran_wrapper, sklearn.utils._openmp_helpers, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._pairwise_fast, msgspec._core, sentencepiece._sentencepiece, regex._regex, msgpack._cmsgpack, ray._raylet, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, xxhash._xxhash, pyarrow._json, pyarrow._acero, pyarrow._csv, pyarrow._substrait, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, cuda_utils, __triton_launcher (total: 262)

Reproduction

deepseek-ai/DeepSeek-V2.5 on 8xA100 (80G), --tp 8

Environment

ubuntu@207-211-174-109:~/gorilla/berkeley-function-call-leaderboard$ python3 -m sglang.check_env
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2025-01-29 08:57:57.439587: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738141077.458842  481708 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738141077.464579  481708 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/home/ubuntu/.local/lib/python3.10/site-packages/pydantic/_internal/_config.py:345: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed
  warnings.warn(message, UserWarning)
Python: 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB
GPU 0,1,2,3,4,5,6,7 Compute Capability: 8.0
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.127.05
PyTorch: 2.5.1+cu124
sglang: 0.4.2
flashinfer: 0.1.6+cu121torch2.4
triton: 3.1.0
transformers: 4.48.1
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.7
hf_transfer: 0.1.9
huggingface_hub: 0.28.0
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.6
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.60.2
anthropic: 0.45.2
decord: 0.6.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV12    NV12    NV12    NV12    NV12    NV12    NV12    PHB     0-239   0-1             N/A
GPU1    NV12     X      NV12    NV12    NV12    NV12    NV12    NV12    PHB     0-239   0-1             N/A
GPU2    NV12    NV12     X      NV12    NV12    NV12    NV12    NV12    PHB     0-239   0-1             N/A
GPU3    NV12    NV12    NV12     X      NV12    NV12    NV12    NV12    PHB     0-239   0-1             N/A
GPU4    NV12    NV12    NV12    NV12     X      NV12    NV12    NV12    PHB     0-239   0-1             N/A
GPU5    NV12    NV12    NV12    NV12    NV12     X      NV12    NV12    PHB     0-239   0-1             N/A
GPU6    NV12    NV12    NV12    NV12    NV12    NV12     X      NV12    PHB     0-239   0-1             N/A
GPU7    NV12    NV12    NV12    NV12    NV12    NV12    NV12     X      PHB     0-239   0-1             N/A
NIC0    PHB     PHB     PHB     PHB     PHB     PHB     PHB     PHB      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0


Hypervisor vendor: KVM
ulimit soft: 1048576
ubuntu@207-211-174-109:~/gorilla/berkeley-function-call-leaderboard$

The text was updated successfully, but these errors were encountered:

zhaochenyang20 · 2025-01-29T17:22:29Z

cc @zhyncs

RonanKMcGovern · 2025-02-03T09:30:51Z

Yes, I'm perhaps getting a similar error on 1xH100 Distil Qwen 1.5B:


  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 189, in get_available_gpu_memory
    free_gpu_memory, _ = torch.cuda.mem_get_info(gpu_id)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/memory.py", line 712, in mem_get_info
    return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: uncorrectable ECC error encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-02-03 01:25:50] Received sigquit from a child proces. It usually means the child failed.

running:

python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --context-length 32000 --host 0.0.0.0 --port 8000 --trust-remote-code --quantization fp8 --kv-cache-dtype fp8_e5m2

using:

lmsysorg/sglang:latest

However, adding --disable-cuda-graph solves the issue.

zhaochenyang20 · 2025-02-03T19:53:51Z

@simveit Hey simon. Could we add this in our docs for serving args?

zhaochenyang20 added the deepseek label Jan 29, 2025

zhaochenyang20 self-assigned this Jan 29, 2025

zhaochenyang20 closed this as completed Feb 3, 2025

simveit mentioned this issue Feb 3, 2025

Update server args doc #3273

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] CUDA error: uncorrectable ECC error encountered #3204

[Bug] CUDA error: uncorrectable ECC error encountered #3204

HuanzhiMao commented Jan 29, 2025

zhaochenyang20 commented Jan 29, 2025

RonanKMcGovern commented Feb 3, 2025

zhaochenyang20 commented Feb 3, 2025

[Bug] CUDA error: uncorrectable ECC error encountered #3204

[Bug] CUDA error: uncorrectable ECC error encountered #3204

Comments

HuanzhiMao commented Jan 29, 2025

Checklist

Describe the bug

Reproduction

Environment

zhaochenyang20 commented Jan 29, 2025

RonanKMcGovern commented Feb 3, 2025

zhaochenyang20 commented Feb 3, 2025