多GPU并行出现CPU OOM问题 #1844

xcvil · 2025-03-04T17:39:03Z

Description of the bug | 错误描述

我用python mp进行的多卡并行，多GPU内存调用没有问题，但是CPU(SLRUM中request的mem)却爆了，请问有人有类似情况吗？

# Create pools - one for each GPU. Each pool will have args.workers_per_gpu workers. 
    # We use pools to manage workers for each GPU.
    pools = []
    for gpu_id in range(args.num_gpus):
        # Create processes_per_gpu workers for each GPU
        pool = mp.Pool(
            processes=args.workers_per_gpu,
            initializer=init_worker,
            initargs=(gpu_id,)
        )
        pools.append(pool)

    results = []
    # The PDFs for each pool (GPU) are fixed with the for-loop below.
    # Create tasks - 现在每个任务只处理一个或少量文件
    for i, (file_path, output_dir) in enumerate(zip(pdf_files, output_dir_wrt_pdf_files)):
        gpu_id = i % args.num_gpus  # GPU仍然循环分配
        # Use partial to bind gpu_id to process_file
        bound_process = partial(worker, gpu_id)
        results.append(pools[gpu_id].apply_async(bound_process, (file_path, output_dir)))

How to reproduce the bug | 如何复现

API调用

# Initialize writers
        image_writer = FileBasedDataWriter(image_dir)
        md_writer = FileBasedDataWriter(doc_output_dir)
        
        # Read PDF content
        reader = FileBasedDataReader("")
        pdf_bytes = reader.read(pdf_path)
        
        # Process PDF
        ds = PymuDocDataset(pdf_bytes)
        infer_result = ds.apply(doc_analyze, ocr=(ds.classify() == SupportedPdfParseMethod.OCR))
        
        # Generate output
        pipe_result = (infer_result.pipe_ocr_mode(image_writer) 
                      if ds.classify() == SupportedPdfParseMethod.OCR 
                      else infer_result.pipe_txt_mode(image_writer))
        
        pipe_result.dump_md(
            md_writer, 
            f"{name_without_ext}.md",
            os.path.basename(image_dir)
        )

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.10.x

Device mode | 设备模式

cuda

xcvil added the bug Something isn't working label Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

多GPU并行出现CPU OOM问题 #1844

多GPU并行出现CPU OOM问题 #1844

xcvil commented Mar 4, 2025

多GPU并行出现CPU OOM问题 #1844

多GPU并行出现CPU OOM问题 #1844

Comments

xcvil commented Mar 4, 2025

Description of the bug | 错误描述

How to reproduce the bug | 如何复现

Operating system | 操作系统

Python version | Python 版本

Software version | 软件版本 (magic-pdf --version)

Device mode | 设备模式