Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting KeyError: 'input_model' when trying to optimize whisper-tiny.en model #1283

Open
mram0509 opened this issue Aug 6, 2024 · 1 comment

Comments

@mram0509
Copy link

mram0509 commented Aug 6, 2024

Describe the bug
Unable to optimize a model with device- cpu and precision int8. Ending up with KeyError: 'input_model' error

To Reproduce
Start with this example: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/ort-whisper

Readme says:

  1. Goto: https://github.com/microsoft/Olive/tree/main/examples/whisper and follow the instructions.

  2. Run the following commands

python prepare_whisper_configs.py --model_name openai/whisper-tiny.en --no_audio_decoder
python -m olive.workflows.run --config whisper_cpu_int8.json --setup
python -m olive.workflows.run --config whisper_cpu_int8.json
  1. Move the resulting model from models/whisper_cpu_int8_0_model.onnx to the same directory as this code.

When I did the above with a pip install of olive-ai, I go the KeyError: 'config' error.

Then I tried installing from source as mentioned here - https://github.com/microsoft/Olive/blob/main/examples/README.md

git clone https://github.com/microsoft/Olive.git
cd Olive
python -m pip install .

Then I tried to "Run the config to optimize the model" from here - https://github.com/microsoft/Olive/blob/main/examples/whisper/README.md

This script runs and creates \Olive-main\examples\whisper\models\conversion-transformers_optimization-onnx_dynamic_quantization-insert_beam_search-prepost\whisper_cpu_int8_cpu-cpu_model.onnx

(olive_env) \Olive-main\examples\whisper>python test_transcription.py --config \Olive-main\examples\whisper\models\conversion-transformers_optimization-onnx_dynamic_quantization-insert_beam_search-prepost\whisper_cpu_int8_cpu-cpu_model.json
Traceback (most recent call last):
File "\Olive-main\examples\whisper\test_transcription.py", line 126, in
output_text = main()
^^^^^^
File "\Olive-main\examples\whisper\test_transcription.py", line 63, in main
model_name = config["input_model"]["model_components"][0]["model_path"]
~~~~~~^^^^^^^^^^^^^^^
KeyError: 'input_model'

I rename this model to whisper_cpu_int8_0_model.onnx and go back to the sample at https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/ort-whisper and try to run the model in the browser and get the following error:

Error: Error: invalid input 'attention_mask'

Expected behavior
I should get a model that runs successfully with onnxruntime-web

Olive config
Add Olive configurations here.

Olive logs

(olive_env) \Olive-main\examples\whisper>python prepare_whisper_configs.py --model_name openai/whisper-tiny.en
config.json: 100%|████████████████████████████████████████████████████████████████████████| 1.94k/1.94k [00:00<?, ?B/s]

(olive_env) \Olive-main\examples\whisper>olive run --config whisper_cpu_int8.json --setup
[2024-08-06 15:01:08,786] [INFO] [run.py:90:get_required_packages] The following packages are required in the local environment: ['onnxruntime']
[2024-08-06 15:01:08,786] [INFO] [run.py:101:install_packages] installing packages: ['onnxruntime']
[2024-08-06 15:01:08,869] [INFO] [run.py:356:check_local_ort_installation] onnxruntime is already installed.

(olive_env) \Olive-main\examples\whisper>olive run --config whisper_cpu_int8.json 2> NUL
[2024-08-06 15:01:41,553] [INFO] [run.py:140:run_engine] Running workflow default_workflow
[2024-08-06 15:01:41,560] [INFO] [cache.py:51:init] Using cache directory: \Olive-main\examples\whisper\cache\default_workflow
[2024-08-06 15:01:41,570] [INFO] [engine.py:1020:save_olive_config] Saved Olive config to \Olive-main\examples\whisper\cache\default_workflow\olive_config.json
[2024-08-06 15:01:41,570] [DEBUG] [run.py:179:run_engine] Registering pass onnxconversion
[2024-08-06 15:01:41,570] [DEBUG] [run.py:179:run_engine] Registering pass orttransformersoptimization
[2024-08-06 15:01:41,570] [DEBUG] [run.py:179:run_engine] Registering pass onnxdynamicquantization
[2024-08-06 15:01:41,570] [DEBUG] [run.py:179:run_engine] Registering pass insertbeamsearch
[2024-08-06 15:01:41,570] [DEBUG] [run.py:179:run_engine] Registering pass appendprepostprocessingops
[2024-08-06 15:01:41,583] [DEBUG] [accelerator_creator.py:130:_fill_accelerators] The accelerator device and execution providers are specified, skipping deduce.
[2024-08-06 15:01:41,583] [DEBUG] [accelerator_creator.py:169:_check_execution_providers] Supported execution providers for device cpu: ['CPUExecutionProvider']
[2024-08-06 15:01:41,586] [DEBUG] [accelerator_creator.py:199:create_accelerators] Initial accelerators and execution providers: {'cpu': ['CPUExecutionProvider']}
[2024-08-06 15:01:41,586] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2024-08-06 15:01:41,586] [DEBUG] [run.py:235:run_engine] Pass onnxconversion already registered
[2024-08-06 15:01:41,586] [DEBUG] [run.py:235:run_engine] Pass orttransformersoptimization already registered
[2024-08-06 15:01:41,586] [DEBUG] [run.py:235:run_engine] Pass onnxdynamicquantization already registered
[2024-08-06 15:01:41,586] [DEBUG] [run.py:235:run_engine] Pass insertbeamsearch already registered
[2024-08-06 15:01:41,586] [DEBUG] [run.py:235:run_engine] Pass appendprepostprocessingops already registered
[2024-08-06 15:01:41,586] [DEBUG] [cache.py:304:set_cache_env] Set OLIVE_CACHE_DIR: \Olive-main\examples\whisper\cache\default_workflow
[2024-08-06 15:01:41,604] [INFO] [engine.py:277:run] Running Olive on accelerator: cpu-cpu
[2024-08-06 15:01:41,604] [INFO] [engine.py:1118:_create_system] Creating target system ...
[2024-08-06 15:01:41,604] [DEBUG] [engine.py:1114:create_system] create native OliveSystem SystemType.Local
[2024-08-06 15:01:41,614] [INFO] [engine.py:1121:_create_system] Target system created in 0.009509 seconds
[2024-08-06 15:01:41,614] [INFO] [engine.py:1130:_create_system] Creating host system ...
[2024-08-06 15:01:41,614] [DEBUG] [engine.py:1114:create_system] create native OliveSystem SystemType.Local
[2024-08-06 15:01:41,614] [INFO] [engine.py:1133:_create_system] Host system created in 0.000000 seconds
[2024-08-06 15:01:41,660] [DEBUG] [engine.py:717:_cache_model] Cached model 9139f706 to \Olive-main\examples\whisper\cache\default_workflow\models\9139f706.json
[2024-08-06 15:01:41,662] [DEBUG] [engine.py:352:run_accelerator] Running Olive in no-search mode ...
[2024-08-06 15:01:41,662] [DEBUG] [engine.py:444:run_no_search] Running ['conversion', 'transformers_optimization', 'onnx_dynamic_quantization', 'insert_beam_search', 'prepost'] with no search ...
[2024-08-06 15:01:41,662] [INFO] [engine.py:886:_run_pass] Running pass conversion:OnnxConversion
[2024-08-06 15:01:48,789] [DEBUG] [pytorch.py:194:get_dummy_inputs] Using dummy_inputs_func to get dummy inputs
[2024-08-06 15:01:51,423] [DEBUG] [conversion.py:196:_export_pytorch_model] Converting model on device cpu with dtype None.
[2024-08-06 15:01:56,203] [DEBUG] [pytorch.py:194:get_dummy_inputs] Using dummy_inputs_func to get dummy inputs
[2024-08-06 15:01:56,558] [DEBUG] [conversion.py:196:_export_pytorch_model] Converting model on device cpu with dtype None.
[2024-08-06 15:01:59,113] [INFO] [engine.py:988:_run_pass] Pass conversion:OnnxConversion finished in 17.451246 seconds
[2024-08-06 15:01:59,117] [DEBUG] [engine.py:717:_cache_model] Cached model 0_OnnxConversion-9139f706-5fa0d4af to \Olive-main\examples\whisper\cache\default_workflow\models\0_OnnxConversion-9139f706-5fa0d4af.json
[2024-08-06 15:01:59,120] [DEBUG] [engine.py:769:_cache_run] Cached run for 9139f706->0_OnnxConversion-9139f706-5fa0d4af into \Olive-main\examples\whisper\cache\default_workflow\runs\OnnxConversion-9139f706-5fa0d4af.json
[2024-08-06 15:01:59,122] [INFO] [engine.py:886:_run_pass] Running pass transformers_optimization:OrtTransformersOptimization
[2024-08-06 15:01:59,232] [DEBUG] [transformer_optimization.py:248:_run_for_config] model_type is set to bart from model attributes
[2024-08-06 15:01:59,233] [DEBUG] [transformer_optimization.py:254:_run_for_config] num_heads is set to 6 from model attributes
[2024-08-06 15:01:59,234] [DEBUG] [transformer_optimization.py:260:_run_for_config] hidden_size is set to 384 from model attributes
[2024-08-06 15:02:04,419] [DEBUG] [transformer_optimization.py:248:_run_for_config] model_type is set to bart from model attributes
[2024-08-06 15:02:04,419] [DEBUG] [transformer_optimization.py:254:_run_for_config] num_heads is set to 6 from model attributes
[2024-08-06 15:02:04,419] [DEBUG] [transformer_optimization.py:260:_run_for_config] hidden_size is set to 384 from model attributes
[2024-08-06 15:02:07,900] [INFO] [engine.py:988:_run_pass] Pass transformers_optimization:OrtTransformersOptimization finished in 8.773139 seconds
[2024-08-06 15:02:07,905] [DEBUG] [engine.py:717:_cache_model] Cached model 1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu to \Olive-main\examples\whisper\cache\default_workflow\models\1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu.json
[2024-08-06 15:02:07,905] [DEBUG] [engine.py:769:_cache_run] Cached run for 0_OnnxConversion-9139f706-5fa0d4af->1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu into \Olive-main\examples\whisper\cache\default_workflow\runs\OrtTransformersOptimization-0-5c93fa9e-cpu-cpu.json
[2024-08-06 15:02:07,905] [INFO] [engine.py:886:_run_pass] Running pass onnx_dynamic_quantization:OnnxDynamicQuantization
[2024-08-06 15:02:07,986] [INFO] [quantization.py:391:_run_for_config] Preprocessing model for quantization
[2024-08-06 15:02:11,336] [INFO] [quantization.py:391:_run_for_config] Preprocessing model for quantization
[2024-08-06 15:02:13,823] [INFO] [engine.py:988:_run_pass] Pass onnx_dynamic_quantization:OnnxDynamicQuantization finished in 5.917982 seconds
[2024-08-06 15:02:13,823] [DEBUG] [engine.py:717:_cache_model] Cached model 2_OnnxDynamicQuantization-1-a1261e22 to \Olive-main\examples\whisper\cache\default_workflow\models\2_OnnxDynamicQuantization-1-a1261e22.json
[2024-08-06 15:02:13,823] [DEBUG] [engine.py:769:_cache_run] Cached run for 1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu->2_OnnxDynamicQuantization-1-a1261e22 into \Olive-main\examples\whisper\cache\default_workflow\runs\OnnxDynamicQuantization-1-a1261e22.json
[2024-08-06 15:02:13,823] [INFO] [engine.py:886:_run_pass] Running pass insert_beam_search:InsertBeamSearch
Removed 67 initializers with duplicated value
Removed 33 initializers with duplicated value
[2024-08-06 15:02:16,653] [DEBUG] [insert_beam_search.py:302:chain_model] Using IR version 8 for chained model
[2024-08-06 15:02:17,329] [INFO] [engine.py:988:_run_pass] Pass insert_beam_search:InsertBeamSearch finished in 3.505282 seconds
[2024-08-06 15:02:17,329] [DEBUG] [engine.py:717:_cache_model] Cached model 3_InsertBeamSearch-2-82bf64f8 to \Olive-main\examples\whisper\cache\default_workflow\models\3_InsertBeamSearch-2-82bf64f8.json
[2024-08-06 15:02:17,329] [DEBUG] [engine.py:769:_cache_run] Cached run for 2_OnnxDynamicQuantization-1-a1261e22->3_InsertBeamSearch-2-82bf64f8 into \Olive-main\examples\whisper\cache\default_workflow\runs\InsertBeamSearch-2-82bf64f8.json
[2024-08-06 15:02:17,336] [INFO] [engine.py:886:_run_pass] Running pass prepost:AppendPrePostProcessingOps
[2024-08-06 15:02:18,924] [INFO] [engine.py:988:_run_pass] Pass prepost:AppendPrePostProcessingOps finished in 1.587309 seconds
[2024-08-06 15:02:18,936] [DEBUG] [engine.py:717:_cache_model] Cached model 4_AppendPrePostProcessingOps-3-9e247843 to \Olive-main\examples\whisper\cache\default_workflow\models\4_AppendPrePostProcessingOps-3-9e247843.json
[2024-08-06 15:02:18,939] [DEBUG] [engine.py:769:_cache_run] Cached run for 3_InsertBeamSearch-2-82bf64f8->4_AppendPrePostProcessingOps-3-9e247843 into \Olive-main\examples\whisper\cache\default_workflow\runs\AppendPrePostProcessingOps-3-9e247843.json
[2024-08-06 15:02:18,939] [INFO] [engine.py:862:_run_passes] Run model evaluation for the final model...
[2024-08-06 15:02:18,939] [DEBUG] [engine.py:1059:_evaluate_model] Evaluating model ...
[2024-08-06 15:02:20,189] [DEBUG] [ort_inference.py:72:get_ort_inference_session] inference_settings: {'execution_provider': ['CPUExecutionProvider'], 'provider_options': None}
[2024-08-06 15:02:20,189] [DEBUG] [ort_inference.py:111:get_ort_inference_session] Normalized providers: ['CPUExecutionProvider'], provider_options: [{}]
[2024-08-06 15:03:18,633] [DEBUG] [footprint.py:234:_resolve_metrics] There is no goal set for metric: latency-avg.
[2024-08-06 15:03:18,636] [DEBUG] [engine.py:864:_run_passes] Signal: {
"latency-avg": 1824.62912
}
[2024-08-06 15:03:19,964] [INFO] [engine.py:378:run_accelerator] Save footprint to models\whisper_cpu_int8_cpu-cpu_footprints.json.
[2024-08-06 15:03:19,970] [DEBUG] [engine.py:380:run_accelerator] run_accelerator done
[2024-08-06 15:03:19,970] [INFO] [engine.py:294:run] Run history for cpu-cpu:
[2024-08-06 15:03:21,520] [INFO] [engine.py:591:dump_run_history] run history:
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+==================================================+==================================================+=============================+================+=============================+
| 9139f706 | | | | |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 0_OnnxConversion-9139f706-5fa0d4af | 9139f706 | OnnxConversion | 17.4512 | |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu | 0_OnnxConversion-9139f706-5fa0d4af | OrtTransformersOptimization | 8.77314 | |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 2_OnnxDynamicQuantization-1-a1261e22 | 1_OrtTransformersOptimization-0-5c93fa9e-cpu-cpu | OnnxDynamicQuantization | 5.91798 | |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 3_InsertBeamSearch-2-82bf64f8 | 2_OnnxDynamicQuantization-1-a1261e22 | InsertBeamSearch | 3.50528 | |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
| 4_AppendPrePostProcessingOps-3-9e247843 | 3_InsertBeamSearch-2-82bf64f8 | AppendPrePostProcessingOps | 1.58731 | { |
| | | | | "latency-avg": 1824.62912 |
| | | | | } |
+--------------------------------------------------+--------------------------------------------------+-----------------------------+----------------+-----------------------------+
[2024-08-06 15:03:21,770] [INFO] [engine.py:309:run] No packaging config provided, skip packaging artifacts

Other information

  • OS: Windows
  • Olive version: main
  • ONNXRuntime package and version:onnxruntime

Additional context
Tying to run this sample - https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/ort-whisper

@jambayk
Copy link
Contributor

jambayk commented Aug 7, 2024

hi, attention_mask was removed from the whisper beam search inputs in ort 1.16.0 so the inference example is outdated. Can you try after removing it from https://github.com/microsoft/onnxruntime-inference-examples/blob/0de2e66e03981714e5308c457b72d785e98d0fe2/js/ort-whisper/main.js#L144

Please refer here for more details of the model inputs https://github.com/microsoft/Olive/blob/main/examples/whisper/code/whisper_dataset.py#L50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants