You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
5. Please use English, otherwise it will be closed.
Describe the bug
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Max context length: 163840
2025-01-29 08:51:14.393720: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140674.412162 464568 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140674.417882 464568 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
[2025-01-29 08:51:25] server_args=ServerArgs(model_path='deepseek-ai/DeepSeek-V2.5', tokenizer_path='deepseek-ai/DeepSeek-V2.5', tokenizer_mode='auto', load_format='auto', trust_remote_code=True, dtype='bfloat16', kv_cache_dtype='auto', quantization_param_path=None, quantization=None, context_length=None, device='cuda', served_model_name='deepseek-ai/DeepSeek-V2.5', chat_template=None, is_embedding=False, revision=None, skip_tokenizer_init=False, host='127.0.0.1', port=1053, mem_fraction_static=0.9, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, prefill_only_one_req=False, tp_size=8, stream_interval=1, stream_output=False, random_seed=646462308, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='sglang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_draft_model_path=None, speculative_algorithm=None, speculative_num_steps=5, speculative_num_draft_tokens=64, speculative_eagle_topk=8, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140689.739817 464959 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140689.745387 464959 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140689.763624 464954 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140689.769267 464954 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140689.980293 464955 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140689.986039 464955 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.043146 464958 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.048954 464958 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.118109 464952 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.124083 464952 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.520316 464956 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.525952 464956 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140690.599351 464957 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140690.605063 464957 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140691.215711 464951 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140691.224214 464951 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738140691.243279 464953 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738140691.251734 464953 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
[2025-01-29 08:51:41 TP6] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP6] Init torch distributed begin.
[2025-01-29 08:51:41 TP4] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP4] Init torch distributed begin.
[2025-01-29 08:51:41 TP5] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP5] Init torch distributed begin.
[2025-01-29 08:51:41 TP1] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:41 TP1] Init torch distributed begin.
[2025-01-29 08:51:42 TP3] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:42 TP3] Init torch distributed begin.
[2025-01-29 08:51:43 TP0] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:43 TP0] Init torch distributed begin.
[2025-01-29 08:51:43 TP7] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:43 TP7] Init torch distributed begin.
[2025-01-29 08:51:44 TP2] MLA optimization is turned on. Use triton backend.
[2025-01-29 08:51:44 TP2] Init torch distributed begin.
[2025-01-29 08:51:44 TP3] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP0] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP7] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP1] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP2] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP5] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP4] sglang is using nccl==2.21.5
[2025-01-29 08:51:44 TP6] sglang is using nccl==2.21.5
[2025-01-29 08:51:51 TP5] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP1] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP0] Load weight begin. avail mem=77.14 GB
[2025-01-29 08:51:51 TP7] Load weight begin. avail mem=77.14 GB
[2025-01-29 08:51:51 TP3] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP4] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP6] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:51 TP2] Load weight begin. avail mem=76.86 GB
[2025-01-29 08:51:52 TP0] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP5] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP4] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP1] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP6] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP7] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP3] Using model weights format ['*.safetensors']
[2025-01-29 08:51:52 TP2] Using model weights format ['*.safetensors']
Cache shape torch.Size([163840, 64])
Loading safetensors checkpoint shards: 0% Completed | 0/55 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 2% Completed | 1/55 [00:01<01:09, 1.28s/it]
Loading safetensors checkpoint shards: 4% Completed | 2/55 [00:02<01:00, 1.13s/it]
Loading safetensors checkpoint shards: 5% Completed | 3/55 [00:02<00:45, 1.13it/s]
Loading safetensors checkpoint shards: 7% Completed | 4/55 [00:05<01:12, 1.42s/it]
Loading safetensors checkpoint shards: 9% Completed | 5/55 [00:05<00:58, 1.18s/it]
Loading safetensors checkpoint shards: 11% Completed | 6/55 [00:06<00:50, 1.03s/it]
Loading safetensors checkpoint shards: 13% Completed | 7/55 [00:07<00:44, 1.07it/s]
Loading safetensors checkpoint shards: 15% Completed | 8/55 [00:08<00:41, 1.13it/s]
Loading safetensors checkpoint shards: 16% Completed | 9/55 [00:08<00:39, 1.18it/s]
Loading safetensors checkpoint shards: 18% Completed | 10/55 [00:09<00:37, 1.21it/s]
Loading safetensors checkpoint shards: 20% Completed | 11/55 [00:10<00:35, 1.24it/s]
Loading safetensors checkpoint shards: 22% Completed | 12/55 [00:11<00:34, 1.24it/s]
Loading safetensors checkpoint shards: 24% Completed | 13/55 [00:12<00:34, 1.23it/s]
Loading safetensors checkpoint shards: 25% Completed | 14/55 [00:12<00:33, 1.21it/s]
Loading safetensors checkpoint shards: 27% Completed | 15/55 [00:13<00:31, 1.27it/s]
Loading safetensors checkpoint shards: 29% Completed | 16/55 [00:14<00:30, 1.27it/s]
Loading safetensors checkpoint shards: 31% Completed | 17/55 [00:15<00:29, 1.27it/s]
Loading safetensors checkpoint shards: 33% Completed | 18/55 [00:16<00:29, 1.25it/s]
Loading safetensors checkpoint shards: 35% Completed | 19/55 [00:16<00:29, 1.24it/s]
Loading safetensors checkpoint shards: 36% Completed | 20/55 [00:17<00:28, 1.24it/s]
Loading safetensors checkpoint shards: 38% Completed | 21/55 [00:18<00:27, 1.24it/s]
Loading safetensors checkpoint shards: 40% Completed | 22/55 [00:19<00:26, 1.24it/s]
Loading safetensors checkpoint shards: 42% Completed | 23/55 [00:19<00:24, 1.29it/s]
Loading safetensors checkpoint shards: 44% Completed | 24/55 [00:20<00:22, 1.36it/s]
Loading safetensors checkpoint shards: 45% Completed | 25/55 [00:21<00:23, 1.28it/s]
Loading safetensors checkpoint shards: 47% Completed | 26/55 [00:22<00:23, 1.23it/s]
Loading safetensors checkpoint shards: 49% Completed | 27/55 [00:23<00:22, 1.22it/s]
Loading safetensors checkpoint shards: 51% Completed | 28/55 [00:24<00:22, 1.21it/s]
Loading safetensors checkpoint shards: 53% Completed | 29/55 [00:24<00:20, 1.24it/s]
Loading safetensors checkpoint shards: 55% Completed | 30/55 [00:26<00:25, 1.03s/it]
Loading safetensors checkpoint shards: 56% Completed | 31/55 [00:27<00:22, 1.09it/s]
Loading safetensors checkpoint shards: 58% Completed | 32/55 [00:27<00:19, 1.16it/s]
Loading safetensors checkpoint shards: 60% Completed | 33/55 [00:29<00:27, 1.24s/it]
Loading safetensors checkpoint shards: 62% Completed | 34/55 [00:30<00:23, 1.10s/it]
Loading safetensors checkpoint shards: 64% Completed | 35/55 [00:31<00:19, 1.02it/s]
Loading safetensors checkpoint shards: 65% Completed | 36/55 [00:32<00:17, 1.11it/s]
Loading safetensors checkpoint shards: 67% Completed | 37/55 [00:32<00:14, 1.21it/s]
Loading safetensors checkpoint shards: 69% Completed | 38/55 [00:33<00:13, 1.26it/s]
Loading safetensors checkpoint shards: 71% Completed | 39/55 [00:34<00:12, 1.30it/s]
Loading safetensors checkpoint shards: 73% Completed | 40/55 [00:34<00:11, 1.30it/s]
Loading safetensors checkpoint shards: 75% Completed | 41/55 [00:36<00:14, 1.02s/it]
Loading safetensors checkpoint shards: 76% Completed | 42/55 [00:37<00:12, 1.05it/s]
Loading safetensors checkpoint shards: 78% Completed | 43/55 [00:38<00:10, 1.12it/s]
Loading safetensors checkpoint shards: 80% Completed | 44/55 [00:38<00:09, 1.18it/s]
Loading safetensors checkpoint shards: 82% Completed | 45/55 [00:39<00:08, 1.21it/s]
Loading safetensors checkpoint shards: 84% Completed | 46/55 [00:40<00:07, 1.25it/s]
Loading safetensors checkpoint shards: 85% Completed | 47/55 [00:41<00:06, 1.26it/s]
Loading safetensors checkpoint shards: 87% Completed | 48/55 [00:41<00:05, 1.28it/s]
Loading safetensors checkpoint shards: 89% Completed | 49/55 [00:42<00:04, 1.30it/s]
Loading safetensors checkpoint shards: 91% Completed | 50/55 [00:44<00:04, 1.03it/s]
Loading safetensors checkpoint shards: 93% Completed | 51/55 [00:44<00:03, 1.08it/s]
Loading safetensors checkpoint shards: 95% Completed | 52/55 [00:45<00:02, 1.11it/s]
Loading safetensors checkpoint shards: 96% Completed | 53/55 [00:46<00:01, 1.14it/s]
Loading safetensors checkpoint shards: 98% Completed | 54/55 [00:47<00:00, 1.17it/s]
[2025-01-29 08:52:39 TP6] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:39 TP1] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:39 TP4] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:39 TP2] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
Loading safetensors checkpoint shards: 100% Completed | 55/55 [00:48<00:00, 1.19it/s]
Loading safetensors checkpoint shards: 100% Completed | 55/55 [00:48<00:00, 1.14it/s]
[2025-01-29 08:52:40 TP3] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:40 TP0] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.71 GB
[2025-01-29 08:52:40 TP5] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.43 GB
[2025-01-29 08:52:40 TP7] Load weight end. type=DeepseekV2ForCausalLM, dtype=torch.bfloat16, avail mem=20.71 GB
[2025-01-29 08:52:41 TP6] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP4] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP3] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP5] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP0] Memory pool end. avail mem=6.69 GB
[2025-01-29 08:52:41 TP1] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP2] Memory pool end. avail mem=6.41 GB
[2025-01-29 08:52:41 TP7] Memory pool end. avail mem=6.69 GB
[2025-01-29 08:52:41 TP0] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP6] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP4] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP3] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP1] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP7] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP0] Capture cuda graph begin. This can take up to several minutes.
0%| | 0/23 [00:00<?, ?it/s][2025-01-29 08:52:41 TP6] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP4] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP3] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP1] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP5] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP2] The following error message 'operation scheduled before its operands' can be ignored.
[2025-01-29 08:52:41 TP7] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP5] Capture cuda graph begin. This can take up to several minutes.
[2025-01-29 08:52:41 TP2] Capture cuda graph begin. This can take up to several minutes.
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
loc("/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/attention/triton_ops/decode_attention.py":310:16): error: operation scheduled before its operands
[2025-01-29 08:52:43 TP0] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:43 TP1] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:43 TP6] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:43 TP5] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP2] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP4] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP3] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
[2025-01-29 08:52:44 TP7] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/E=160,N=192,device_name=NVIDIA_A100-SXM4-80GB.json
4%|▍ | 1/23 [00:07<02:39, 7.23s/it]
9%|▊ | 2/23 [00:08<01:20, 3.82s/it][rank1]:[E129 08:52:51.685329145 ProcessGroupNCCL.cpp:1595] [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: uncorrectable ECC error encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:364 'uncorrectable ECC error encountered'
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x79ebd4f6c446 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x79ebd4f166e4 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x79ec210a5a18 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x79eb381c7726 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x79eb381cc3f0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x79eb381d3b5a in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x79eb381d561d in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x79ec23a1b5c0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x79ec2da94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x79ec2db26850 in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: uncorrectable ECC error encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x79ebd4f6c446 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x79ebd4f166e4 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x79ec210a5a18 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x79eb381c7726 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x79eb381cc3f0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x79eb381d3b5a in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x79eb381d561d in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x145c0 (0x79ec23a1b5c0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x94ac3 (0x79ec2da94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: <unknown function> + 0x126850 (0x79ec2db26850 in /lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1601 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x79ebd4f6c446 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xe4271b (0x79eb37e4271b in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x145c0 (0x79ec23a1b5c0 in /home/ubuntu/.local/lib/python3.10/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x94ac3 (0x79ec2da94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #4: <unknown function> + 0x126850 (0x79ec2db26850 in /lib/x86_64-linux-gnu/libc.so.6)
Fatal Python error: Aborted
Thread 0x000079e69b400640 (most recent call first):
File "/usr/lib/python3.10/threading.py", line 324 in wait
File "/usr/lib/python3.10/threading.py", line 607 in wait
File "/home/ubuntu/.local/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap
Thread 0x000079e803c00640 (most recent call first):
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 47 in _recv_msg
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 153 in _read_thread
File "/usr/lib/python3.10/threading.py", line 953 in run
File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap
Thread 0x000079ec2dd59000 (most recent call first):
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_ops.py", line 1116 in __call__
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/_custom_ops.py", line 90 in get_graph_buffer_ipc_meta
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/device_communicators/custom_all_reduce.py", line 320 in register_graph_buffers
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/device_communicators/custom_all_reduce.py", line 317 in capture
File "/usr/lib/python3.10/contextlib.py", line 153 in __exit__
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 325 in graph_capture
File "/usr/lib/python3.10/contextlib.py", line 153 in __exit__
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/distributed/parallel_state.py", line 941 in graph_capture
File "/usr/lib/python3.10/contextlib.py", line 153 in __exit__
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 275 in capture
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/cuda_graph_runner.py", line 226 in __init__
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 730 in init_cuda_graphs
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 214 in __init__
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 68 in __init__
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 63 in __init__
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 239 in __init__
File "/home/ubuntu/.local/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1773 in run_scheduler_process
File "/usr/lib/python3.10/multiprocessing/process.py", line 108 in run
File "/usr/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap
File "/usr/lib/python3.10/multiprocessing/spawn.py", line 129 in _main
File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116 in spawn_main
File "<string>", line 1 in <module>
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, _brotli, charset_normalizer.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, uvloop.loop, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_linux, psutil._psutil_posix, setproctitle, zmq.backend.cython._zmq, yaml._yaml, markupsafe._speedups, PIL._imaging, PIL._imagingft, google._upb._message, h5py._debian_h5py_serial._errors, h5py._debian_h5py_serial.defs, h5py._debian_h5py_serial._objects, h5py._debian_h5py_serial.h5, h5py._debian_h5py_serial.h5r, h5py._debian_h5py_serial.utils, h5py._debian_h5py_serial.h5s, h5py._debian_h5py_serial.h5ac, h5py._debian_h5py_serial.h5p, h5py._debian_h5py_serial.h5t, h5py._debian_h5py_serial._conv, h5py._debian_h5py_serial.h5z, h5py._debian_h5py_serial._proxy, h5py._debian_h5py_serial.h5a, h5py._debian_h5py_serial.h5d, h5py._debian_h5py_serial.h5ds, h5py._debian_h5py_serial.h5g, h5py._debian_h5py_serial.h5i, h5py._debian_h5py_serial.h5f, h5py._debian_h5py_serial.h5fd, h5py._debian_h5py_serial.h5pl, h5py._debian_h5py_serial.h5o, h5py._debian_h5py_serial.h5l, h5py._debian_h5py_serial._selector, h5py.atexit, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, scipy.sparse._csparsetools, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, jaxlib.cpu_feature_guard, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.__check_build._check_build, sklearn.utils.murmurhash, lz4._version, lz4.frame._frame, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg.cython_lapack, scipy.linalg._decomp_update, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.sparse.linalg._isolve._iterative, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap_module, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.special.cython_special, scipy.stats._stats, beta_ufunc, scipy.stats._boost.beta_ufunc, binom_ufunc, scipy.stats._boost.binom_ufunc, nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats._biasedurn, scipy.stats._hypotests_pythran, scipy.stats._statlib, scipy.stats._mvn, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._unuran.unuran_wrapper, sklearn.utils._openmp_helpers, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._pairwise_fast, msgspec._core, sentencepiece._sentencepiece, regex._regex, msgpack._cmsgpack, ray._raylet, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, xxhash._xxhash, pyarrow._json, pyarrow._acero, pyarrow._csv, pyarrow._substrait, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, cuda_utils, __triton_launcher (total: 262)
Reproduction
deepseek-ai/DeepSeek-V2.5 on 8xA100 (80G), --tp 8
Environment
ubuntu@207-211-174-109:~/gorilla/berkeley-function-call-leaderboard$ python3 -m sglang.check_env
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2025-01-29 08:57:57.439587: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738141077.458842 481708 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738141077.464579 481708 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
/home/ubuntu/.local/lib/python3.10/site-packages/pydantic/_internal/_config.py:345: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed
warnings.warn(message, UserWarning)
Python: 3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB
GPU 0,1,2,3,4,5,6,7 Compute Capability: 8.0
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.127.05
PyTorch: 2.5.1+cu124
sglang: 0.4.2
flashinfer: 0.1.6+cu121torch2.4
triton: 3.1.0
transformers: 4.48.1
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.7
hf_transfer: 0.1.9
huggingface_hub: 0.28.0
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.6
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.60.2
anthropic: 0.45.2
decord: 0.6.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV12 NV12 NV12 NV12 NV12 NV12 NV12 PHB 0-239 0-1 N/A
GPU1 NV12 X NV12 NV12 NV12 NV12 NV12 NV12 PHB 0-239 0-1 N/A
GPU2 NV12 NV12 X NV12 NV12 NV12 NV12 NV12 PHB 0-239 0-1 N/A
GPU3 NV12 NV12 NV12 X NV12 NV12 NV12 NV12 PHB 0-239 0-1 N/A
GPU4 NV12 NV12 NV12 NV12 X NV12 NV12 NV12 PHB 0-239 0-1 N/A
GPU5 NV12 NV12 NV12 NV12 NV12 X NV12 NV12 PHB 0-239 0-1 N/A
GPU6 NV12 NV12 NV12 NV12 NV12 NV12 X NV12 PHB 0-239 0-1 N/A
GPU7 NV12 NV12 NV12 NV12 NV12 NV12 NV12 X PHB 0-239 0-1 N/A
NIC0 PHB PHB PHB PHB PHB PHB PHB PHB X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
Hypervisor vendor: KVM
ulimit soft: 1048576
ubuntu@207-211-174-109:~/gorilla/berkeley-function-call-leaderboard$
The text was updated successfully, but these errors were encountered:
Yes, I'm perhaps getting a similar error on 1xH100 Distil Qwen 1.5B:
File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 189, in get_available_gpu_memory
free_gpu_memory, _ = torch.cuda.mem_get_info(gpu_id)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/memory.py", line 712, in mem_get_info
return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: uncorrectable ECC error encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[2025-02-03 01:25:50] Received sigquit from a child proces. It usually means the child failed.
Checklist
Describe the bug
Reproduction
deepseek-ai/DeepSeek-V2.5
on 8xA100 (80G),--tp 8
Environment
The text was updated successfully, but these errors were encountered: