Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用GPU加速pangu预报遇到问题 #67

Open
aiwuyouxi opened this issue Aug 18, 2024 · 0 comments
Open

使用GPU加速pangu预报遇到问题 #67

aiwuyouxi opened this issue Aug 18, 2024 · 0 comments

Comments

@aiwuyouxi
Copy link

使用GPU加速pangu预报遇到问题

作者您好:

我在使用GPU加速pangu预报遇到问题!在conda配置好基础的环境后,通过示例修改后的inference_gpu.py代码(如下所示)运行发现Execution Providers: ['CPUExecutionProvider'],并未开启GPU加速,

已做如下尝试,但仍未解决问题:

  1. 确认 CUDA 和 cuDNN 是否正确安装
1
  1. 确认 CUDAExecutionProvider 可用:

    import onnxruntime as ort
    print(ort.get_available_providers())

    输出为:['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

  2. 确认设置了 CUDA Execution Provider:

    providers = [
        ('CUDAExecutionProvider', {
            'device_id': 0,  
            'arena_extend_strategy': 'kSameAsRequested',
            'gpu_mem_limit': 2 * 1024 * 1024 * 1024,  
            'cudnn_conv_algo_search': 'EXHAUSTIVE',  
            'do_copy_in_default_stream': True,
        }),
        'CPUExecutionProvider',  # 回退到CPU,如果CUDA不支持
    ]
    
    ort_session_24 = ort.InferenceSession(r'pangu_weather_24.onnx', providers=providers)
    # 其他模型的 session 初始化也按照相同方式
    
    print("Execution Providers:", ort_session_24.get_providers())

    输出结果仍为:Execution Providers: ['CPUExecutionProvider'],未开启 GPU。

  3. 更新 onnx 和 onnxruntime-gpu 至最新版本:

    pip install --upgrade onnx
    pip install --upgrade onnxruntime-gpu

重新运行代码依然是CPU运行。

本人已黔驴技穷,希望作者能够提供帮助,感激不尽!附:环境配置和尝试使用 GPU 跑 Pangu 的python代码如下:

环境配置

  • 环境:conda 24.1.2 - Python 3.10.14
    • onnx==1.12.0
    • onnxruntime-gpu==1.14.0
    • numpy==1.26.4
  • CUDA:CUDA Version: 12.3

尝试使用 GPU 跑 Pangu 的python代码

import os
import numpy as np
import onnx
import onnxruntime as ort

# 输入和输出数据的目录
input_data_dir = r'input_data\\'
output_data_dir = r'output_data\\'

# 加载 ONNX 模型
model_24 = onnx.load('pangu_weather_24.onnx')
model_6 = onnx.load('pangu_weather_6.onnx')
model_3 = onnx.load('pangu_weather_3.onnx')
model_1 = onnx.load('pangu_weather_1.onnx')

# 设置 onnxruntime 的行为
options = ort.SessionOptions()
options.enable_cpu_mem_arena = False
options.enable_mem_pattern = False
options.enable_mem_reuse = False
options.intra_op_num_threads = 1  # 增加此值以加快推理速度,但将消耗更多内存

# 设置 CUDA 提供者的行为
cuda_provider_options = {'arena_extend_strategy': 'kSameAsRequested'}

# 初始化 Pangu-Weather 模型的 onnxruntime 会话
ort_session_24 = ort.InferenceSession('pangu_weather_24.onnx', sess_options=options, providers=[('CUDAExecutionProvider', cuda_provider_options)])
ort_session_6 = ort.InferenceSession('pangu_weather_6.onnx', sess_options=options, providers=[('CUDAExecutionProvider', cuda_provider_options)])
ort_session_3 = ort.InferenceSession('pangu_weather_3.onnx', sess_options=options, providers=[('CUDAExecutionProvider', cuda_provider_options)])
ort_session_1 = ort.InferenceSession('pangu_weather_1.onnx', sess_options=options, providers=[('CUDAExecutionProvider', cuda_provider_options)])

# 加载上层大气的 numpy 数组
input_data = np.load(os.path.join(input_data_dir, 'input_upper.npy')).astype(np.float32)
# 加载地面的 numpy 数组
input_surface_data = np.load(os.path.join(input_data_dir, 'input_surface.npy')).astype(np.float32)

# 定义模型及其预报时长
models = {
    'model_24': (ort_session_24, 24),
    'model_6': (ort_session_6, 6),
    'model_3': (ort_session_3, 3),
    'model_1': (ort_session_1, 1)
}

def select_model(remaining_hours):
    """
    选择不超过剩余小时数的最大预报时长的模型。
    """
    selected_model = None
    max_duration = 0
    
    for model_name, (session, duration) in models.items():
        if duration <= remaining_hours and duration > max_duration:
            selected_model = (session, duration)
            max_duration = duration
            
    return selected_model

# 初始化变量
remaining_hours = 10  # 总预报时长(小时)
forecast_hour = 0  # 从 0 小时开始
idate = '2023073000'

while remaining_hours > 0:
    print(remaining_hours)
    model_session, duration = select_model(remaining_hours)
    if model_session is None:
        raise ValueError("未找到适合的模型以完成剩余的预报。")
    
    # 运行推理会话
    print('开始运行...')
    output, output_surface = model_session.run(None, {'input': input_data, 'input_surface': input_surface_data})
    
    # 保存结果
    forecast_hour += duration
    np.save(os.path.join(output_data_dir, f'output_upper_{forecast_hour:02d}.npy'), output)
    np.save(os.path.join(output_data_dir, f'output_surface_{forecast_hour:02d}.npy'), output_surface)
    os.system(f'copy {output_data_dir}output_surface_{forecast_hour:02d}.npy \\output_{idate}\\output_surface_{forecast_hour:02d}.npy')
    os.system(f'copy {output_data_dir}output_upper_{forecast_hour:02d}.npy \\output_{idate}\\output_upper_{forecast_hour:02d}.npy')
    
    # 更新下一次迭代的输入
    input_data, input_surface_data = output, output_surface
    remaining_hours -= duration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant