Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFace warnings / errors on experiment runs -e.g., "... model 'OptimizedModule' is not supported ..." #670

Open
mmartin9684-sil opened this issue Mar 3, 2025 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@mmartin9684-sil
Copy link
Collaborator

HuggingFace is reporting an error at the start of the test step during an experiment run:

[ERROR|base.py:1149] 2025-02-28 12:05:52,437 >> The model 'OptimizedModule' is not supported for . Supported models are ['BartForConditionalGeneration', 'BigBirdPegasusForConditionalGeneration', 'BlenderbotForConditionalGeneration', 'BlenderbotSmallForConditionalGeneration', 'EncoderDecoderModel', 'FSMTForConditionalGeneration', 'GPTSanJapaneseForConditionalGeneration', 'LEDForConditionalGeneration', 'LongT5ForConditionalGeneration', 'M2M100ForConditionalGeneration', 'MarianMTModel', 'MBartForConditionalGeneration', 'MT5ForConditionalGeneration', 'MvpForConditionalGeneration', 'NllbMoeForConditionalGeneration', 'PegasusForConditionalGeneration', 'PegasusXForConditionalGeneration', 'PLBartForConditionalGeneration', 'ProphetNetForConditionalGeneration', 'Qwen2AudioForConditionalGeneration', 'SeamlessM4TForTextToText', 'SeamlessM4Tv2ForTextToText', 'SwitchTransformersForConditionalGeneration', 'T5ForConditionalGeneration', 'UMT5ForConditionalGeneration', 'XLMProphetNetForConditionalGeneration'].

However, the test step appears to work successfully for this experiment despite the error. The model is set to 'facebook/nllb-200-distilled-1.3B'.

@mmartin9684-sil mmartin9684-sil added the bug Something isn't working label Mar 3, 2025
@mmartin9684-sil
Copy link
Collaborator Author

Another potential compatibility issue with the recent HuggingFace updates - this warning is reported at the start of training:

1740758254327 aqua-gpu-dallas:gpu2 DEBUG Encoding train dataset: 100% 9365/9365 [00:00<00:00, 13408.85 examples/s]
/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py:449: FutureWarning:

`torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.

@mmartin9684-sil
Copy link
Collaborator Author

An additional warning being reported by HF for recent experiments. This warning occurs at the end of preprocessing / start of training.

/usr/local/lib/python3.10/dist-packages/transformers/training_args.py:1568: FutureWarning:

`evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead

2025-02-28 15:57:25,900 - silnlp.common.environment - INFO - Uploading MT/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/effective-config-96ede8fa89.yml
=== Training (Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical) ===

@mmartin9684-sil
Copy link
Collaborator Author

ClearML warning at the start of the training step:

[INFO|integration_utils.py:1774] 2025-02-28 15:57:57,855 >> Automatic ClearML logging enabled.
[INFO|integration_utils.py:1802] 2025-02-28 15:57:57,856 >> ClearML Task has been initialized.
2025-02-28 15:57:57,856 - clearml.Task - WARNING - Parameters must be of builtin type (Transformers/accelerator_config[AcceleratorConfig])
  0% 0/5000 [00:00<?, ?it/s]

1740758286198 aqua-gpu-dallas:gpu2 DEBUG [INFO|trainer.py:2314] 2025-02-28 15:58:01,402 >> ***** Running training *****

@mmartin9684-sil
Copy link
Collaborator Author

Torch warning at the start of training:

1740758286198 aqua-gpu-dallas:gpu2 DEBUG [INFO|trainer.py:2314] 2025-02-28 15:58:01,402 >> ***** Running training *****
[INFO|trainer.py:2315] 2025-02-28 15:58:01,402 >>   Num examples = 9,365
[INFO|trainer.py:2316] 2025-02-28 15:58:01,402 >>   Num Epochs = 35
[INFO|trainer.py:2317] 2025-02-28 15:58:01,402 >>   Instantaneous batch size per device = 64
[INFO|trainer.py:2319] 2025-02-28 15:58:01,402 >>   Training with DataParallel so batch size has been adjusted to: 32
[INFO|trainer.py:2320] 2025-02-28 15:58:01,402 >>   Total train batch size (w. parallel, distributed & accumulation) = 64
[INFO|trainer.py:2321] 2025-02-28 15:58:01,402 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:2322] 2025-02-28 15:58:01,402 >>   Total optimization steps = 5,000
[INFO|trainer.py:2323] 2025-02-28 15:58:01,404 >>   Number of trainable parameters = 1,370,638,336

  0% 0/5000 [00:01<?, ?it/s]
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:295: FutureWarning:

`torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.

@mmartin9684-sil
Copy link
Collaborator Author

Warning at the end of training when the model is being saved:

1740763687635 aqua-gpu-dallas:gpu2 DEBUG {'loss': 3.0325, 'grad_norm': 0.11882693320512772, 'learning_rate': 0.0, 'epoch': 34.19}
100% 5000/5000 [1:29:58<00:00,  1.08it/s]2025-02-28 17:28:02,669 - silnlp.nmt.hugging_face_config - INFO - Saving model checkpoint to /root/.cache/silnlp/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/run/checkpoint-5000 using custom _save function
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:2817: UserWarning:

Moving the following attributes in the config to the generation config: {'max_length': 200}. You are seeing this warning because you've set generation parameters in the model config, as opposed to in the generation config.

[INFO|configuration_utils.py:414] 2025-02-28 17:28:02,671 >> Configuration saved in /root/.cache/silnlp/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/run/checkpoint-5000/config.json
[INFO|configuration_utils.py:865] 2025-02-28 17:28:02,671 >> Configuration saved in /root/.cache/silnlp/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/run/checkpoint-5000/generation_config.json

@bhartmoore
Copy link
Collaborator

I am seeing this warning during mid-training evals:

[WARNING|trainer.py:761] 2025-03-03 18:15:47,707 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.

@bhartmoore
Copy link
Collaborator

Warning at the start of training, just before the torch.cuda.amp.GradScaler(args...) warning listed above:

2025-03-03 13:08:00  [WARNING|logging.py:328] 2025-03-03 18:07:56,218 >> The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
2025-03-03 13:09:19  Encoding train dataset: 100% 6400/6400 [00:00<00:00, 18518.27 examples/s]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants