HuggingFace warnings / errors on experiment runs -e.g., "... model 'OptimizedModule' is not supported ..." #670

mmartin9684-sil · 2025-03-03T13:43:13Z

HuggingFace is reporting an error at the start of the test step during an experiment run:

[ERROR|base.py:1149] 2025-02-28 12:05:52,437 >> The model 'OptimizedModule' is not supported for . Supported models are ['BartForConditionalGeneration', 'BigBirdPegasusForConditionalGeneration', 'BlenderbotForConditionalGeneration', 'BlenderbotSmallForConditionalGeneration', 'EncoderDecoderModel', 'FSMTForConditionalGeneration', 'GPTSanJapaneseForConditionalGeneration', 'LEDForConditionalGeneration', 'LongT5ForConditionalGeneration', 'M2M100ForConditionalGeneration', 'MarianMTModel', 'MBartForConditionalGeneration', 'MT5ForConditionalGeneration', 'MvpForConditionalGeneration', 'NllbMoeForConditionalGeneration', 'PegasusForConditionalGeneration', 'PegasusXForConditionalGeneration', 'PLBartForConditionalGeneration', 'ProphetNetForConditionalGeneration', 'Qwen2AudioForConditionalGeneration', 'SeamlessM4TForTextToText', 'SeamlessM4Tv2ForTextToText', 'SwitchTransformersForConditionalGeneration', 'T5ForConditionalGeneration', 'UMT5ForConditionalGeneration', 'XLMProphetNetForConditionalGeneration'].

However, the test step appears to work successfully for this experiment despite the error. The model is set to 'facebook/nllb-200-distilled-1.3B'.

The text was updated successfully, but these errors were encountered:

mmartin9684-sil · 2025-03-03T13:47:07Z

Another potential compatibility issue with the recent HuggingFace updates - this warning is reported at the start of training:

1740758254327 aqua-gpu-dallas:gpu2 DEBUG Encoding train dataset: 100% 9365/9365 [00:00<00:00, 13408.85 examples/s]
/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py:449: FutureWarning:

`torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.

mmartin9684-sil · 2025-03-03T13:50:05Z

An additional warning being reported by HF for recent experiments. This warning occurs at the end of preprocessing / start of training.

/usr/local/lib/python3.10/dist-packages/transformers/training_args.py:1568: FutureWarning:

`evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead

2025-02-28 15:57:25,900 - silnlp.common.environment - INFO - Uploading MT/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/effective-config-96ede8fa89.yml
=== Training (Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical) ===

mmartin9684-sil · 2025-03-03T13:52:10Z

ClearML warning at the start of the training step:

[INFO|integration_utils.py:1774] 2025-02-28 15:57:57,855 >> Automatic ClearML logging enabled.
[INFO|integration_utils.py:1802] 2025-02-28 15:57:57,856 >> ClearML Task has been initialized.
2025-02-28 15:57:57,856 - clearml.Task - WARNING - Parameters must be of builtin type (Transformers/accelerator_config[AcceleratorConfig])
  0% 0/5000 [00:00<?, ?it/s]

1740758286198 aqua-gpu-dallas:gpu2 DEBUG [INFO|trainer.py:2314] 2025-02-28 15:58:01,402 >> ***** Running training *****

mmartin9684-sil · 2025-03-03T13:53:34Z

Torch warning at the start of training:

1740758286198 aqua-gpu-dallas:gpu2 DEBUG [INFO|trainer.py:2314] 2025-02-28 15:58:01,402 >> ***** Running training *****
[INFO|trainer.py:2315] 2025-02-28 15:58:01,402 >>   Num examples = 9,365
[INFO|trainer.py:2316] 2025-02-28 15:58:01,402 >>   Num Epochs = 35
[INFO|trainer.py:2317] 2025-02-28 15:58:01,402 >>   Instantaneous batch size per device = 64
[INFO|trainer.py:2319] 2025-02-28 15:58:01,402 >>   Training with DataParallel so batch size has been adjusted to: 32
[INFO|trainer.py:2320] 2025-02-28 15:58:01,402 >>   Total train batch size (w. parallel, distributed & accumulation) = 64
[INFO|trainer.py:2321] 2025-02-28 15:58:01,402 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:2322] 2025-02-28 15:58:01,402 >>   Total optimization steps = 5,000
[INFO|trainer.py:2323] 2025-02-28 15:58:01,404 >>   Number of trainable parameters = 1,370,638,336

  0% 0/5000 [00:01<?, ?it/s]
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:295: FutureWarning:

`torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.

mmartin9684-sil · 2025-03-03T13:55:36Z

Warning at the end of training when the model is being saved:

1740763687635 aqua-gpu-dallas:gpu2 DEBUG {'loss': 3.0325, 'grad_norm': 0.11882693320512772, 'learning_rate': 0.0, 'epoch': 34.19}
100% 5000/5000 [1:29:58<00:00,  1.08it/s]2025-02-28 17:28:02,669 - silnlp.nmt.hugging_face_config - INFO - Saving model checkpoint to /root/.cache/silnlp/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/run/checkpoint-5000 using custom _save function
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:2817: UserWarning:

Moving the following attributes in the config to the generation config: {'max_length': 200}. You are seeing this warning because you've set generation parameters in the model config, as opposed to in the generation config.

[INFO|configuration_utils.py:414] 2025-02-28 17:28:02,671 >> Configuration saved in /root/.cache/silnlp/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/run/checkpoint-5000/config.json
[INFO|configuration_utils.py:865] 2025-02-28 17:28:02,671 >> Configuration saved in /root/.cache/silnlp/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/run/checkpoint-5000/generation_config.json

bhartmoore · 2025-03-03T18:20:59Z

I am seeing this warning during mid-training evals:

[WARNING|trainer.py:761] 2025-03-03 18:15:47,707 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.

bhartmoore · 2025-03-03T18:27:22Z

Warning at the start of training, just before the torch.cuda.amp.GradScaler(args...) warning listed above:

2025-03-03 13:08:00  [WARNING|logging.py:328] 2025-03-03 18:07:56,218 >> The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
2025-03-03 13:09:19  Encoding train dataset: 100% 6400/6400 [00:00<00:00, 18518.27 examples/s]

mmartin9684-sil added the bug Something isn't working label Mar 3, 2025

mmartin9684-sil assigned ddaspit Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HuggingFace warnings / errors on experiment runs -e.g., "... model 'OptimizedModule' is not supported ..." #670

HuggingFace warnings / errors on experiment runs -e.g., "... model 'OptimizedModule' is not supported ..." #670

mmartin9684-sil commented Mar 3, 2025

mmartin9684-sil commented Mar 3, 2025

mmartin9684-sil commented Mar 3, 2025

mmartin9684-sil commented Mar 3, 2025

mmartin9684-sil commented Mar 3, 2025

mmartin9684-sil commented Mar 3, 2025

bhartmoore commented Mar 3, 2025

bhartmoore commented Mar 3, 2025

HuggingFace warnings / errors on experiment runs -e.g., "... model 'OptimizedModule' is not supported ..." #670

HuggingFace warnings / errors on experiment runs -e.g., "... model 'OptimizedModule' is not supported ..." #670

Comments

mmartin9684-sil commented Mar 3, 2025

mmartin9684-sil commented Mar 3, 2025

mmartin9684-sil commented Mar 3, 2025

mmartin9684-sil commented Mar 3, 2025

mmartin9684-sil commented Mar 3, 2025

mmartin9684-sil commented Mar 3, 2025

bhartmoore commented Mar 3, 2025

bhartmoore commented Mar 3, 2025