Skip to content
This repository has been archived by the owner on Jun 25, 2023. It is now read-only.

fast infer solution #8

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

fast infer solution #8

wants to merge 3 commits into from

Conversation

khushpatel2002
Copy link

Hello, In this solution, I have converted the models to onnx, here it is using onnx runtime in order to get fast inference, the models are not tinkered as per the challenge only the format is changed.

@rsolovev
Copy link
Collaborator

Hi @khushpatel2002, unfortunately this solution did not launch properly, here are full logs:

Model Loading Started...

Downloading (…)lve/main/config.json:   0%|          | 0.00/841 [00:00<?, ?B/s]
Downloading (…)lve/main/config.json: 100%|██████████| 841/841 [00:00<00:00, 6.62MB/s]
Framework not specified. Using pt to export to ONNX.

Downloading pytorch_model.bin:   0%|          | 0.00/1.11G [00:00<?, ?B/s]
Downloading pytorch_model.bin:   6%|▌         | 62.9M/1.11G [00:00<00:01, 540MB/s]
Downloading pytorch_model.bin:  11%|█▏        | 126M/1.11G [00:00<00:01, 564MB/s] 
Downloading pytorch_model.bin:  17%|█▋        | 189M/1.11G [00:00<00:01, 563MB/s]
Downloading pytorch_model.bin:  23%|██▎       | 252M/1.11G [00:00<00:01, 572MB/s]
Downloading pytorch_model.bin:  28%|██▊       | 315M/1.11G [00:00<00:01, 575MB/s]
Downloading pytorch_model.bin:  34%|███▍      | 377M/1.11G [00:00<00:01, 576MB/s]
Downloading pytorch_model.bin:  40%|███▉      | 440M/1.11G [00:00<00:01, 574MB/s]
Downloading pytorch_model.bin:  45%|████▌     | 503M/1.11G [00:00<00:01, 578MB/s]
Downloading pytorch_model.bin:  51%|█████     | 566M/1.11G [00:00<00:00, 569MB/s]
Downloading pytorch_model.bin:  57%|█████▋    | 629M/1.11G [00:01<00:00, 573MB/s]
Downloading pytorch_model.bin:  62%|██████▏   | 692M/1.11G [00:01<00:00, 576MB/s]
Downloading pytorch_model.bin:  68%|██████▊   | 755M/1.11G [00:01<00:00, 574MB/s]
Downloading pytorch_model.bin:  74%|███████▎  | 818M/1.11G [00:01<00:00, 573MB/s]
Downloading pytorch_model.bin:  79%|███████▉  | 881M/1.11G [00:01<00:00, 571MB/s]
Downloading pytorch_model.bin:  85%|████████▍ | 944M/1.11G [00:01<00:00, 567MB/s]
Downloading pytorch_model.bin:  91%|█████████ | 1.01G/1.11G [00:01<00:00, 573MB/s]
Downloading pytorch_model.bin:  96%|█████████▌| 1.07G/1.11G [00:01<00:00, 579MB/s]
Downloading pytorch_model.bin: 100%|██████████| 1.11G/1.11G [00:01<00:00, 571MB/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]
Downloading (…)tencepiece.bpe.model: 100%|██████████| 5.07M/5.07M [00:00<00:00, 170MB/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 150/150 [00:00<00:00, 1.17MB/s]
Using framework PyTorch: 2.0.0+cu118
Overriding 1 configuration item(s)
	- use_cache -> False
============= Diagnostic Run torch.onnx.export version 2.0.0+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Traceback (most recent call last):
  File "/usr/local/bin/uvicorn", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 410, in main
    run(
  File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 578, in run
    server.run()
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 61, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 68, in serve
    config.load()
  File "/usr/local/lib/python3.10/site-packages/uvicorn/config.py", line 473, in load
    self.loaded_app = import_from_string(self.app)
  File "/usr/local/lib/python3.10/site-packages/uvicorn/importer.py", line 24, in import_from_string
    raise exc from None
  File "/usr/local/lib/python3.10/site-packages/uvicorn/importer.py", line 21, in import_from_string
    module = importlib.import_module(module_str)
  File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/code/app.py", line 16, in <module>
    cardiffnlp_ort_model = ORTModelForSequenceClassification.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 646, in from_pretrained
    return super().from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/optimum/modeling_base.py", line 362, in from_pretrained
    return from_pretrained_method(
  File "/usr/local/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 592, in _from_transformers
    return cls._from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 495, in _from_pretrained
    model = ORTModel.load_model(
  File "/usr/local/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 357, in load_model
    validate_provider_availability(provider)  # raise error if the provider is not available
  File "/usr/local/lib/python3.10/site-packages/optimum/onnxruntime/utils.py", line 232, in validate_provider_availability
    raise ImportError(
ImportError: `onnxruntime-gpu` package is installed, but CUDA requirements could not be loaded. Make sure to meet the required dependencies: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html

Seems like a missing dependencies problem, we suggest using pytorch/pytorch:*-cuda*-cudnn*-runtime images as base, as we found them to be hassle-free when working with AWS GPU instances

@khushpatel2002
Copy link
Author

khushpatel2002 commented May 11, 2023

@rsolovev Request to check again.

Copy link
Collaborator

@rsolovev rsolovev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @khushpatel2002, here are the latest results. I guess latest pytorch did not do the trick, maybe cudnn-runtime images would suite better

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants