官方样例demo报错: MemoryError: std::bad_alloc #3338

ocivo · 2025-01-03T10:30:46Z

直接按官方样例demo写的

paddleClas版本 2.6.0 paddlepaddle版本2.6.2
python 3.8
paddle-cpu版本和gpu版本报相同的错误
python run.py
2025-01-03 18:29:51 INFO: Loading faiss with AVX512 support.
2025-01-03 18:29:51 INFO: Successfully loaded faiss with AVX512 support.
[2025/01/03 18:29:51] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.
Traceback (most recent call last):
File "run.py", line 2, in
model = paddleclas.PaddleClas(model_name="person_attribute")
File "venv/lib/python3.8/site-packages/paddleclas/paddleclas.py", line 610, in init
self.predictor = ClsPredictor(self._config)
File "venv/lib/python3.8/site-packages/paddleclas/deploy/python/predict_cls.py", line 28, in init
super().init(config["Global"])
File "venv/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 37, in init
self.predictor, self.config = self.create_paddle_predictor(
File "venv/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor
predictor = create_predictor(config)
MemoryError: std::bad_alloc

TingquanGao · 2025-01-03T12:14:02Z

请提供一下完整的启动命令

cainmagi · 2025-01-15T20:53:24Z

@TingquanGao

我来提供一个复现这个问题的方法：

复现过程

首先，将以下测试脚本test.py保存在某处，例如/example/test.py

import paddleclas

model = paddleclas.PaddleClas(model_name="text_image_orientation")

然后，运行以下命令启动docker容器，该容器是一个纯净的Debian Python容器。

docker run --gpus all -it --rm --shm-size=1g -v "/example:/example" python:3.10-slim bash

在容器中、安装必要的依赖项，并运行测试脚本。这里安装的是CPU版本的依赖项，因为目前Paddle的GPU版本和本人使用的其他库不兼容。

# Make the dependencies of OpenCV complete.
apt-get update
apt-get -y install libgomp1 libgl1-mesa-glx libglib2.0-0
# Install PaddleClas
pip install paddlepaddle paddleclas
# Run the test.
cd /example
python test.py

报错结果

于是，得到以下报错：

[2025/01/15 20:08:51] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:18<00:00, 402kiB/s]
[2025/01/15 20:09:12] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.
Traceback (most recent call last):
  File "/example/test.py", line 3, in <module>
    model = paddleclas.PaddleClas(model_name="text_image_orientation")
  File "/usr/local/lib/python3.10/site-packages/paddleclas/paddleclas.py", line 610, in __init__
    self.predictor = ClsPredictor(self._config)
  File "/usr/local/lib/python3.10/site-packages/paddleclas/deploy/python/predict_cls.py", line 28, in __init__
    super().__init__(config["Global"])
  File "/usr/local/lib/python3.10/site-packages/paddleclas/deploy/utils/predictor.py", line 37, in __init__
    self.predictor, self.config = self.create_paddle_predictor(
  File "/usr/local/lib/python3.10/site-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor
    predictor = create_predictor(config)
MemoryError: std::bad_alloc

深入测试

是shared memory的问题吗？

有没有可能，是shared memory太小？我尝试将container的启动参数改为：

docker run --gpus all -it --rm --shm-size=16g -v "/example:/example" python:3.10-slim bash

这个大小已经比教程里还大了。

然而错误如故。

是因为使用了GPU映射的关系吗？

尝试将--gpus all参数去除，

docker run -it --rm --shm-size=1g -v "/example:/example" python:3.10-slim bash

然而错误如故。

是paddlepaddle的安装有问题、或是因为paddlepaddle的CPU版本不可用吗？

我刚好知道NVIDIA有一个paddlepaddle的镜像，透过尝试运行它，并重复上述测试，

docker run --gpus all -it --rm --shm-size=1g -v "/example:/example" nvcr.io/nvidia/paddlepaddle:24.10-py3

为什么要用24.10而不是更新的版本（例如24.12）？这是因为，paddleclas不支持python 3.12，而镜像版本24.10是最后一个使用Ubuntu 22.04和Python 3.10的版本。注意它的CUDA仍然是几乎最新的12.6，且其已经内置了paddlepaddle-gpu的版本。

注意由于镜像里已经有了GPU版本的paddlepaddle-gpu，安装过程需要修改一下：

# Make the dependencies of OpenCV complete.
# Note that this is an Ubuntu image.
apt-get update
apt-get -y install libgomp1 libegl1 libglu1-mesa-dev
# Do not need to install paddlepaddle because paddlepaddle-gpu already exists.
pip install paddleclas
# Run the test.
cd /example
python test.py

这回依然报错，但是报错的内容不一样了

[2025/01/15 20:46:49] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:18<00:00, 406kiB/s]
Traceback (most recent call last):
  File "/example/test.py", line 3, in <module>
    model = paddleclas.PaddleClas(model_name="text_image_orientation")
  File "/usr/local/lib/python3.10/dist-packages/paddleclas/paddleclas.py", line 610, in __init__
    self.predictor = ClsPredictor(self._config)
  File "/usr/local/lib/python3.10/dist-packages/paddleclas/deploy/python/predict_cls.py", line 28, in __init__
    super().__init__(config["Global"])
  File "/usr/local/lib/python3.10/dist-packages/paddleclas/deploy/utils/predictor.py", line 37, in __init__
    self.predictor, self.config = self.create_paddle_predictor(
  File "/usr/local/lib/python3.10/dist-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor
    predictor = create_predictor(config)
ValueError: basic_string::_M_replace_aux

使用官方的`paddle`镜像

严格按照教程，使用官方的paddle镜像。

docker run --gpus all --name ppcls -it --rm -v "/example:/example" --shm-size=8G --network=host paddlepaddle/paddle:2.3.0-gpu-cuda10.2-cudnn7 /bin/bash

运行后，在容器内，安装并测试

# Install dependencies. Do not need to fix OpenCV issues.
pip install paddleclas
# Run the test.
cd /example
python test.py

成功运行起来了。

grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
2025-01-15 21:16:35 INFO: Loading faiss with AVX2 support.
2025-01-15 21:16:35 INFO: Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2025-01-15 21:16:35 INFO: Loading faiss.
2025-01-15 21:16:35 INFO: Successfully loaded faiss.
[2025/01/15 21:16:35] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:18<00:00, 410kiB/s]
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:67: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
  'nearest': Image.NEAREST,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:68: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  'bilinear': Image.BILINEAR,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:69: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
  'bicubic': Image.BICUBIC,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:70: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
  'box': Image.BOX,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:71: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
  'lanczos': Image.LANCZOS,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:72: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
  'hamming': Image.HAMMING,
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:73: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  'random': (Image.BILINEAR, Image.BICUBIC)
/usr/local/python3.7.0/lib/python3.7/site-packages/paddleclas/deploy/python/preprocess.py:73: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
  'random': (Image.BILINEAR, Image.BICUBIC)

虽然它能成功运行，但它的版本是Ubuntu 16.04，且python版本是3.7，要使用这个版本的话，只能通过多容器模式，实在太麻烦了。

使用官方的`paddleclas`镜像

docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host paddlecloud/paddleclas:2.4-gpu-cuda11.2-cudnn8-latest /bin/bash

运行后，在容器内，直接测试

# Run the test.
cd /example
python test.py

这个也是能正常运行的。

是必须要退回到python 3.7版本吗

尝试退回到一个python 3.7的Debian纯净镜像，

docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host python:3.7-slim bash

并重复之前的安装、测试步骤。发现测试也通过了。

2025-01-15 21:33:18 INFO: Loading faiss with AVX2 support.
2025-01-15 21:33:18 INFO: Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2025-01-15 21:33:18 INFO: Loading faiss.
2025-01-15 21:33:18 INFO: Successfully loaded faiss.
[2025/01/15 21:33:18] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:19<00:00, 380kiB/s]
[2025/01/15 21:33:39] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.

那么可以使用python 3.8版本吗

切换到python 3.8镜像，

docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host python:3.8-slim bash

并重复之前的安装、测试步骤。测试不通过。

2025-01-15 21:38:40 INFO: Loading faiss with AVX512 support.
2025-01-15 21:38:40 INFO: Successfully loaded faiss with AVX512 support.
[2025/01/15 21:38:40] ppcls INFO: download https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar to /root/.paddleclas/inference_model/PULC/text_image_orientation/text_image_orientation_infer.tar
100%|██████████████████████████████████████████████████████████████████████████████| 7.40M/7.40M [00:19<00:00, 383kiB/s]
[2025/01/15 21:39:02] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.
Traceback (most recent call last):
  File "test-ori.py", line 3, in <module>
    model = paddleclas.PaddleClas(model_name="text_image_orientation")
  File "/usr/local/lib/python3.8/site-packages/paddleclas/paddleclas.py", line 610, in __init__
    self.predictor = ClsPredictor(self._config)
  File "/usr/local/lib/python3.8/site-packages/paddleclas/deploy/python/predict_cls.py", line 28, in __init__
    super().__init__(config["Global"])
  File "/usr/local/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 37, in __init__
    self.predictor, self.config = self.create_paddle_predictor(
  File "/usr/local/lib/python3.8/site-packages/paddleclas/deploy/utils/predictor.py", line 108, in create_paddle_predictor
    predictor = create_predictor(config)
MemoryError: std::bad_alloc

那么，问题出在paddlepaddle和paddleclas版本上吗

还是在python 3.8镜像，

docker run --gpus all -it --rm -v "/example:/example" --shm-size=8G --network=host python:3.8-slim bash

这一回，强制指定paddlepaddle和paddleclas版本为旧版

# Make the dependencies of OpenCV complete.
apt-get update
apt-get -y install libgomp1 libgl1-mesa-glx libglib2.0-0
# Install PaddleClas
pip install paddlepaddle==2.5.2 paddleclas==2.5.1
# Run the test.
cd /example
python test.py

运行成功：

2025-01-15 21:41:02 INFO: Loading faiss with AVX2 support.
2025-01-15 21:41:02 INFO: Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2025-01-15 21:41:02 INFO: Loading faiss.
2025-01-15 21:41:02 INFO: Successfully loaded faiss.
[2025/01/15 21:41:02] ppcls WARNING: The current running environment does not support the use of GPU. CPU has been used instead.

结论

~~令人难以忍受的是，在各种标准环境下，PaddleClas都不能正常初始化，怀疑它的开发者所用的CPU是不是amd64的。~~

已经确认，paddlepaddle和paddleclas的版本之间存在兼容性问题。必须要指定合适的版本才行，不能太新也不能太旧。

本人后续在python 3.8的环境下，进行了进一步的确认：

paddlepaddle 2.5.2和paddleclas 2.5.1是可以兼容的。
paddlepaddle 2.5.2和paddleclas 2.6.0是可以兼容的。
paddlepaddle 2.6.0~2.6.2和paddleclas 2.6.0是不兼容的。会有MemoryError: std::bad_alloc
最新的paddlepaddle 3.0.0rc0和paddleclas 2.6.0也是不兼容的。会有另外的错误。

本人所用的环境如下：

CUDA (if used): Cuda compilation tools, release 12.6, V12.6.77
OS (in container python:3.10-slim): Debian GNU/Linux 12 (bookworm) (Python is 3.10, PaddlePaddle is 2.6.2, PaddleClas is 2.6.0)
OS (in container nvidia/paddlepaddle): Ubuntu 22.04.5 LTS (Python is 3.10, PaddlePaddle is 2.6.1, PaddleClas is 2.6.0)
OS (in container paddlepaddle/paddle): Ubuntu 16.04.7 LTS (Python is 3.7, PaddlePaddle is 2.3.0, PaddleClas is 2.5.1)
OS (in container paddlecloud/paddleclas): Ubuntu 18.04.5 LTS (Python is 3.7, PaddlePaddle is 2.3.0.post112, PaddleClas is 0.0.0 (actually it should be 2.4, so this seems to be a dev version))
OS (in container python:3.7-slim): Debian GNU/Linux 12 (bookworm) (Python is 3.7, PaddlePaddle is 2.5.2, PaddleClas is 2.5.1)
OS (in container python:3.8-slim): Debian GNU/Linux 12 (bookworm) (Python is 3.8, PaddlePaddle is 2.6.2, PaddleClas is 2.6.0)
OS (native device): Windows 11 Enterprise 24H2 (10.0.26100 Build 26100)
Docker version: 27.3.1, build ce12230
NVIDIA Driver: 566.03

可以肯定的是，尽管上述测试多次报出memory error，但运行脚本的时候、本人的内存是绝对没有满的。

TingquanGao · 2025-01-20T03:22:41Z

感谢您的反馈和非常详细的实验！我们会安排排查该问题。

drawyaW · 2025-02-20T01:18:52Z

赞！很详细的解决方案，我尝试把paddlepaddle版本回退，确实成功运行！

wang-kangkang · 2025-02-25T07:24:33Z

可见paddle相关库在发布的时候，并没有自动化运行各种测试样例的机制。
而实际上这个并不难，把各种demo级别的命令集中到一起，运行一次就行。这很明显就是测试团队leader的责任

ocivo changed the title ~~MemoryError: std::bad_alloc~~ 官方样例demo报错: MemoryError: std::bad_alloc Jan 3, 2025

TingquanGao self-assigned this Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

官方样例demo报错: MemoryError: std::bad_alloc #3338

官方样例demo报错: MemoryError: std::bad_alloc #3338

ocivo commented Jan 3, 2025 •

edited

Loading

TingquanGao commented Jan 3, 2025

cainmagi commented Jan 15, 2025 •

edited

Loading

TingquanGao commented Jan 20, 2025

drawyaW commented Feb 20, 2025

wang-kangkang commented Feb 25, 2025

官方样例demo报错: MemoryError: std::bad_alloc #3338

官方样例demo报错: MemoryError: std::bad_alloc #3338

Comments

ocivo commented Jan 3, 2025 • edited Loading

TingquanGao commented Jan 3, 2025

cainmagi commented Jan 15, 2025 • edited Loading

复现过程

报错结果

深入测试

是shared memory的问题吗？

是因为使用了GPU映射的关系吗？

是paddlepaddle的安装有问题、或是因为paddlepaddle的CPU版本不可用吗？

使用官方的paddle镜像

使用官方的paddleclas镜像

是必须要退回到python 3.7版本吗

那么可以使用python 3.8版本吗

那么，问题出在paddlepaddle和paddleclas版本上吗

结论

TingquanGao commented Jan 20, 2025

drawyaW commented Feb 20, 2025

wang-kangkang commented Feb 25, 2025

ocivo commented Jan 3, 2025 •

edited

Loading

cainmagi commented Jan 15, 2025 •

edited

Loading

使用官方的`paddle`镜像

使用官方的`paddleclas`镜像