Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to continue training with finetune? What are the steps to continue training #843

Open
HimanshuRepozitory opened this issue Jan 22, 2025 · 3 comments

Comments

@HimanshuRepozitory
Copy link

Discussed in #577

Originally posted by WithAngleOrDemon September 21, 2024
After terminating the training in finetune, I am unable to continue with the previous training. What is the method to continue training?
I tried using the newly generated model from Lora, but it still reported an error

[2024-09-21 05:55:53,084][fish_speech.models.text2semantic.llama][INFO] - [rank: 0] Loaded weights with error: _IncompatibleKeys(missing_keys=['embeddings.lora_A', 'embeddings.lora_B', 'codebook_embeddings.lora_A', 'codebook_embeddings.lora_B', 'layers.0.attention.wqkv.lora_A', 'layers.0.attention.wqkv.lora_B', 'layers.0.attention.wo.lora_A', 'layers.0.attention.wo.lora_B', 'layers.0.feed_forward.w1.lora_A', 'layers.0.feed_forward.w1.lora_B', 'layers.0.feed_forward.w3.lora_A', 'layers.0.feed_forward.w3.lora_B', 'layers.0.feed_forward.w2.lora_A', 'layers.0.feed_forward.w2.lora_B', 'layers.1.attention.wqkv.lora_A', 'layers.1.attention.wqkv.lora_B', 'layers.1.attention.wo.lora_A', 'layers.1.attention.wo.lora_B', 'layers.1.feed_forward.w1.lora_A', 'layers.1.feed_forward.w1.lora_B', 'layers.1.feed_forward.w3.lora_A', 'layers.1.feed_forward.w3.lora_B', 'layers.1.feed_forward.w2.lora_A', 'layers.1.feed_forward.w2.lora_B', 'layers.2.attention.wqkv.lora_A', 'layers.2.attention.wqkv.lora_B', 'layers.2.attention.wo.lora_A', 'layers.2.attention.wo.lora_B', 'layers.2.feed_forward.w1.lora_A', 'layers.2.feed_forward.w1.lora_B', 'layers.2.feed_forward.w3.lora_A', 'layers.2.feed_forward.w3.lora_B', 'layers.2.feed_forward.w2.lora_A', 'layers.2.feed_forward.w2.lora_B', 'layers.3.attention.wqkv.lora_A', 'layers.3.attention.wqkv.lora_B', 'layers.3.attention.wo.lora_A', 'layers.3.attention.wo.lora_B', 'layers.3.feed_forward.w1.lora_A', 'layers.3.feed_forward.w1.lora_B', 'layers.3.feed_forward.w3.lora_A', 'layers.3.feed_forward.w3.lora_B', 'layers.3.feed_forward.w2.lora_A', 'layers.3.feed_forward.w2.lora_B', 'layers.4.attention.wqkv.lora_A', 'layers.4.attention.wqkv.lora_B', 'layers.4.attention.wo.lora_A', 'layers.4.attention.wo.lora_B', 'layers.4.feed_forward.w1.lora_A', 'layers.4.feed_forward.w1.lora_B', 'layers.4.feed_forward.w3.lora_A', 'layers.4.feed_forward.w3.lora_B', 'layers.4.feed_forward.w2.lora_A', 'layers.4.feed_forward.w2.lora_B', 'layers.5.attention.wqkv.lora_A', 'layers.5.attention.wqkv.lora_B', 'layers.5.attention.wo.lora_A', 'layers.5.attention.wo.lora_B', 'layers.5.feed_forward.w1.lora_A', 'layers.5.feed_forward.w1.lora_B', 'layers.5.feed_forward.w3.lora_A', 'layers.5.feed_forward.w3.lora_B', 'layers.5.feed_forward.w2.lora_A', 'layers.5.feed_forward.w2.lora_B', 'layers.6.attention.wqkv.lora_A', 'layers.6.attention.wqkv.lora_B', 'layers.6.attention.wo.lora_A', 'layers.6.attention.wo.lora_B', 'layers.6.feed_forward.w1.lora_A', 'layers.6.feed_forward.w1.lora_B', 'layers.6.feed_forward.w3.lora_A', 'layers.6.feed_forward.w3.lora_B', 'layers.6.feed_forward.w2.lora_A', 'layers.6.feed_forward.w2.lora_B', 'layers.7.attention.wqkv.lora_A', 'layers.7.attention.wqkv.lora_B', 'layers.7.attention.wo.lora_A', 'layers.7.attention.wo.lora_B', 'layers.7.feed_forward.w1.lora_A', 'layers.7.feed_forward.w1.lora_B', 'layers.7.feed_forward.w3.lora_A', 'layers.7.feed_forward.w3.lora_B', 'layers.7.feed_forward.w2.lora_A', 'layers.7.feed_forward.w2.lora_B', 'layers.8.attention.wqkv.lora_A', 'layers.8.attention.wqkv.lora_B', 'layers.8.attention.wo.lora_A', 'layers.8.attention.wo.lora_B', 'layers.8.feed_forward.w1.lora_A', 'layers.8.feed_forward.w1.lora_B', 'layers.8.feed_forward.w3.lora_A', 'layers.8.feed_forward.w3.lora_B', 'layers.8.feed_forward.w2.lora_A', 'layers.8.feed_forward.w2.lora_B', 'layers.9.attention.wqkv.lora_A', 'layers.9.attention.wqkv.lora_B', 'layers.9.attention.wo.lora_A', 'layers.9.attention.wo.lora_B', 'layers.9.feed_forward.w1.lora_A', 'layers.9.feed_forward.w1.lora_B', 'layers.9.feed_forward.w3.lora_A', 'layers.9.feed_forward.w3.lora_B', 'layers.9.feed_forward.w2.lora_A', 'layers.9.feed_forward.w2.lora_B', 'layers.10.attention.wqkv.lora_A', 'layers.10.attention.wqkv.lora_B', 'layers.10.attention.wo.lora_A', 'layers.10.attention.wo.lora_B', 'layers.10.feed_forward.w1.lora_A', 'layers.10.feed_forward.w1.lora_B', 'layers.10.feed_forward.w3.lora_A', 'layers.10.feed_forward.w3.lora_B', 'layers.10.feed_forward.w2.lora_A', 'layers.10.feed_forward.w2.lora_B', 'layers.11.attention.wqkv.lora_A', 'layers.11.attention.wqkv.lora_B', 'layers.11.attention.wo.lora_A', 'layers.11.attention.wo.lora_B', 'layers.11.feed_forward.w1.lora_A', 'layers.11.feed_forward.w1.lora_B', 'layers.11.feed_forward.w3.lora_A', 'layers.11.feed_forward.w3.lora_B', 'layers.11.feed_forward.w2.lora_A', 'layers.11.feed_forward.w2.lora_B', 'layers.12.attention.wqkv.lora_A', 'layers.12.attention.wqkv.lora_B', 'layers.12.attention.wo.lora_A', 'layers.12.attention.wo.lora_B', 'layers.12.feed_forward.w1.lora_A', 'layers.12.feed_forward.w1.lora_B', 'layers.12.feed_forward.w3.lora_A', 'layers.12.feed_forward.w3.lora_B', 'layers.12.feed_forward.w2.lora_A', 'layers.12.feed_forward.w2.lora_B', 'layers.13.attention.wqkv.lora_A', 'layers.13.attention.wqkv.lora_B', 'layers.13.attention.wo.lora_A', 'layers.13.attention.wo.lora_B', 'layers.13.feed_forward.w1.lora_A', 'layers.13.feed_forward.w1.lora_B', 'layers.13.feed_forward.w3.lora_A', 'layers.13.feed_forward.w3.lora_B', 'layers.13.feed_forward.w2.lora_A', 'layers.13.feed_forward.w2.lora_B', 'layers.14.attention.wqkv.lora_A', 'layers.14.attention.wqkv.lora_B', 'layers.14.attention.wo.lora_A', 'layers.14.attention.wo.lora_B', 'layers.14.feed_forward.w1.lora_A', 'layers.14.feed_forward.w1.lora_B', 'layers.14.feed_forward.w3.lora_A', 'layers.14.feed_forward.w3.lora_B', 'layers.14.feed_forward.w2.lora_A', 'layers.14.feed_forward.w2.lora_B', 'layers.15.attention.wqkv.lora_A', 'layers.15.attention.wqkv.lora_B', 'layers.15.attention.wo.lora_A', 'layers.15.attention.wo.lora_B', 'layers.15.feed_forward.w1.lora_A', 'layers.15.feed_forward.w1.lora_B', 'layers.15.feed_forward.w3.lora_A', 'layers.15.feed_forward.w3.lora_B', 'layers.15.feed_forward.w2.lora_A', 'layers.15.feed_forward.w2.lora_B', 'layers.16.attention.wqkv.lora_A', 'layers.16.attention.wqkv.lora_B', 'layers.16.attention.wo.lora_A', 'layers.16.attention.wo.lora_B', 'layers.16.feed_forward.w1.lora_A', 'layers.16.feed_forward.w1.lora_B', 'layers.16.feed_forward.w3.lora_A', 'layers.16.feed_forward.w3.lora_B', 'layers.16.feed_forward.w2.lora_A', 'layers.16.feed_forward.w2.lora_B', 'layers.17.attention.wqkv.lora_A', 'layers.17.attention.wqkv.lora_B', 'layers.17.attention.wo.lora_A', 'layers.17.attention.wo.lora_B', 'layers.17.feed_forward.w1.lora_A', 'layers.17.feed_forward.w1.lora_B', 'layers.17.feed_forward.w3.lora_A', 'layers.17.feed_forward.w3.lora_B', 'layers.17.feed_forward.w2.lora_A', 'layers.17.feed_forward.w2.lora_B', 'layers.18.attention.wqkv.lora_A', 'layers.18.attention.wqkv.lora_B', 'layers.18.attention.wo.lora_A', 'layers.18.attention.wo.lora_B', 'layers.18.feed_forward.w1.lora_A', 'layers.18.feed_forward.w1.lora_B', 'layers.18.feed_forward.w3.lora_A', 'layers.18.feed_forward.w3.lora_B', 'layers.18.feed_forward.w2.lora_A', 'layers.18.feed_forward.w2.lora_B', 'layers.19.attention.wqkv.lora_A', 'layers.19.attention.wqkv.lora_B', 'layers.19.attention.wo.lora_A', 'layers.19.attention.wo.lora_B', 'layers.19.feed_forward.w1.lora_A', 'layers.19.feed_forward.w1.lora_B', 'layers.19.feed_forward.w3.lora_A', 'layers.19.feed_forward.w3.lora_B', 'layers.19.feed_forward.w2.lora_A', 'layers.19.feed_forward.w2.lora_B', 'layers.20.attention.wqkv.lora_A', 'layers.20.attention.wqkv.lora_B', 'layers.20.attention.wo.lora_A', 'layers.20.attention.wo.lora_B', 'layers.20.feed_forward.w1.lora_A', 'layers.20.feed_forward.w1.lora_B', 'layers.20.feed_forward.w3.lora_A', 'layers.20.feed_forward.w3.lora_B', 'layers.20.feed_forward.w2.lora_A', 'layers.20.feed_forward.w2.lora_B', 'layers.21.attention.wqkv.lora_A', 'layers.21.attention.wqkv.lora_B', 'layers.21.attention.wo.lora_A', 'layers.21.attention.wo.lora_B', 'layers.21.feed_forward.w1.lora_A', 'layers.21.feed_forward.w1.lora_B', 'layers.21.feed_forward.w3.lora_A', 'layers.21.feed_forward.w3.lora_B', 'layers.21.feed_forward.w2.lora_A', 'layers.21.feed_forward.w2.lora_B', 'layers.22.attention.wqkv.lora_A', 'layers.22.attention.wqkv.lora_B', 'layers.22.attention.wo.lora_A', 'layers.22.attention.wo.lora_B', 'layers.22.feed_forward.w1.lora_A', 'layers.22.feed_forward.w1.lora_B', 'layers.22.feed_forward.w3.lora_A', 'layers.22.feed_forward.w3.lora_B', 'layers.22.feed_forward.w2.lora_A', 'layers.22.feed_forward.w2.lora_B', 'layers.23.attention.wqkv.lora_A', 'layers.23.attention.wqkv.lora_B', 'layers.23.attention.wo.lora_A', 'layers.23.attention.wo.lora_B', 'layers.23.feed_forward.w1.lora_A', 'layers.23.feed_forward.w1.lora_B', 'layers.23.feed_forward.w3.lora_A', 'layers.23.feed_forward.w3.lora_B', 'layers.23.feed_forward.w2.lora_A', 'layers.23.feed_forward.w2.lora_B', 'output.lora_A', 'output.lora_B', 'fast_embeddings.lora_A', 'fast_embeddings.lora_B', 'fast_layers.0.attention.wqkv.lora_A', 'fast_layers.0.attention.wqkv.lora_B', 'fast_layers.0.attention.wo.lora_A', 'fast_layers.0.attention.wo.lora_B', 'fast_layers.0.feed_forward.w1.lora_A', 'fast_layers.0.feed_forward.w1.lora_B', 'fast_layers.0.feed_forward.w3.lora_A', 'fast_layers.0.feed_forward.w3.lora_B', 'fast_layers.0.feed_forward.w2.lora_A', 'fast_layers.0.feed_forward.w2.lora_B', 'fast_layers.1.attention.wqkv.lora_A', 'fast_layers.1.attention.wqkv.lora_B', 'fast_layers.1.attention.wo.lora_A', 'fast_layers.1.attention.wo.lora_B', 'fast_layers.1.feed_forward.w1.lora_A', 'fast_layers.1.feed_forward.w1.lora_B', 'fast_layers.1.feed_forward.w3.lora_A', 'fast_layers.1.feed_forward.w3.lora_B', 'fast_layers.1.feed_forward.w2.lora_A', 'fast_layers.1.feed_forward.w2.lora_B', 'fast_layers.2.attention.wqkv.lora_A', 'fast_layers.2.attention.wqkv.lora_B', 'fast_layers.2.attention.wo.lora_A', 'fast_layers.2.attention.wo.lora_B', 'fast_layers.2.feed_forward.w1.lora_A', 'fast_layers.2.feed_forward.w1.lora_B', 'fast_layers.2.feed_forward.w3.lora_A', 'fast_layers.2.feed_forward.w3.lora_B', 'fast_layers.2.feed_forward.w2.lora_A', 'fast_layers.2.feed_forward.w2.lora_B', 'fast_layers.3.attention.wqkv.lora_A', 'fast_layers.3.attention.wqkv.lora_B', 'fast_layers.3.attention.wo.lora_A', 'fast_layers.3.attention.wo.lora_B', 'fast_layers.3.feed_forward.w1.lora_A', 'fast_layers.3.feed_forward.w1.lora_B', 'fast_layers.3.feed_forward.w3.lora_A', 'fast_layers.3.feed_forward.w3.lora_B', 'fast_layers.3.feed_forward.w2.lora_A', 'fast_layers.3.feed_forward.w2.lora_B', 'fast_output.lora_A', 'fast_output.lora_B'], unexpected_keys=[])
[2024-09-21 05:55:53,093][__main__][INFO] - [rank: 0] Instantiating callbacks...
[2024-09-21 05:55:53,093][fish_speech.utils.instantiators][INFO] - [rank: 0] Instantiating callback <lightning.pytorch.callbacks.ModelCheckpoint>
[2024-09-21 05:55:53,098][fish_speech.utils.instantiators][INFO] - [rank: 0] Instantiating callback <lightning.pytorch.callbacks.ModelSummary>
[2024-09-21 05:55:53,099][fish_speech.utils.instantiators][INFO] - [rank: 0] Instantiating callback <lightning.pytorch.callbacks.LearningRateMonitor>
[2024-09-21 05:55:53,099][fish_speech.utils.instantiators][INFO] - [rank: 0] Instantiating callback <fish_speech.callbacks.GradNormMonitor>
[2024-09-21 05:55:53,125][__main__][INFO] - [rank: 0] Instantiating loggers...
[2024-09-21 05:55:53,125][fish_speech.utils.instantiators][INFO] - [rank: 0] Instantiating logger <lightning.pytorch.loggers.tensorboard.TensorBoardLogger>
[2024-09-21 05:55:53,129][__main__][INFO] - [rank: 0] Instantiating trainer <lightning.pytorch.trainer.Trainer>
Trainer already configured with model summary callbacks: [<class 'lightning.pytorch.callbacks.model_summary.ModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
[2024-09-21 05:55:53,188][__main__][INFO] - [rank: 0] Logging hyperparameters!
2024-09-21 05:55:53.504280: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-21 05:55:53.520652: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-21 05:55:53.525473: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-21 05:55:53.537436: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-09-21 05:55:54.685740: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2024-09-21 05:55:55,703][__main__][INFO] - [rank: 0] Starting training!
[2024-09-21 05:55:55,709][__main__][INFO] - [rank: 0] Resuming from checkpoint: results/checkpoints/last.ckpt
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

/usr/local/lib/python3.10/dist-packages/lightning/pytorch/callbacks/model_checkpoint.py:654: Checkpoint directory /content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/results/checkpoints exists and is not empty.
Restoring states from the checkpoint path at results/checkpoints/last.ckpt
[2024-09-21 05:55:56,694][fish_speech.utils.utils][ERROR] - [rank: 0] 
Traceback (most recent call last):
  File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/utils/utils.py", line 66, in wrap
    metric_dict, object_dict = task_func(cfg=cfg)
  File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/train.py", line 110, in train
    trainer.fit(model=model, datamodule=datamodule, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 538, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 46, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
    return function(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 950, in _run
    self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 398, in _restore_modules_and_callbacks
    self.restore_model()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 275, in restore_model
    self.trainer.strategy.load_model_state_dict(
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 371, in load_model_state_dict
    self.lightning_module.load_state_dict(checkpoint["state_dict"], strict=strict)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for TextToSemantic:
	Missing key(s) in state_dict: "model.embeddings.weight", "model.codebook_embeddings.weight", "model.layers.0.attention.wqkv.weight", "model.layers.0.attention.wo.weight", "model.layers.0.feed_forward.w1.weight", "model.layers.0.feed_forward.w3.weight", "model.layers.0.feed_forward.w2.weight", "model.layers.0.ffn_norm.weight", "model.layers.0.attention_norm.weight", "model.layers.1.attention.wqkv.weight", "model.layers.1.attention.wo.weight", "model.layers.1.feed_forward.w1.weight", "model.layers.1.feed_forward.w3.weight", "model.layers.1.feed_forward.w2.weight", "model.layers.1.ffn_norm.weight", "model.layers.1.attention_norm.weight", "model.layers.2.attention.wqkv.weight", "model.layers.2.attention.wo.weight", "model.layers.2.feed_forward.w1.weight", "model.layers.2.feed_forward.w3.weight", "model.layers.2.feed_forward.w2.weight", "model.layers.2.ffn_norm.weight", "model.layers.2.attention_norm.weight", "model.layers.3.attention.wqkv.weight", "model.layers.3.attention.wo.weight", "model.layers.3.feed_forward.w1.weight", "model.layers.3.feed_forward.w3.weight", "model.layers.3.feed_forward.w2.weight", "model.layers.3.ffn_norm.weight", "model.layers.3.attention_norm.weight", "model.layers.4.attention.wqkv.weight", "model.layers.4.attention.wo.weight", "model.layers.4.feed_forward.w1.weight", "model.layers.4.feed_forward.w3.weight", "model.layers.4.feed_forward.w2.weight", "model.layers.4.ffn_norm.weight", "model.layers.4.attention_norm.weight", "model.layers.5.attention.wqkv.weight", "model.layers.5.attention.wo.weight", "model.layers.5.feed_forward.w1.weight", "model.layers.5.feed_forward.w3.weight", "model.layers.5.feed_forward.w2.weight", "model.layers.5.ffn_norm.weight", "model.layers.5.attention_norm.weight", "model.layers.6.attention.wqkv.weight", "model.layers.6.attention.wo.weight", "model.layers.6.feed_forward.w1.weight", "model.layers.6.feed_forward.w3.weight", "model.layers.6.feed_forward.w2.weight", "model.layers.6.ffn_norm.weight", "model.layers.6.attention_norm.weight", "model.layers.7.attention.wqkv.weight", "model.layers.7.attention.wo.weight", "model.layers.7.feed_forward.w1.weight", "model.layers.7.feed_forward.w3.weight", "model.layers.7.feed_forward.w2.weight", "model.layers.7.ffn_norm.weight", "model.layers.7.attention_norm.weight", "model.layers.8.attention.wqkv.weight", "model.layers.8.attention.wo.weight", "model.layers.8.feed_forward.w1.weight", "model.layers.8.feed_forward.w3.weight", "model.layers.8.feed_forward.w2.weight", "model.layers.8.ffn_norm.weight", "model.layers.8.attention_norm.weight", "model.layers.9.attention.wqkv.weight", "model.layers.9.attention.wo.weight", "model.layers.9.feed_forward.w1.weight", "model.layers.9.feed_forward.w3.weight", "model.layers.9.feed_forward.w2.weight", "model.layers.9.ffn_norm.weight", "model.layers.9.attention_norm.weight", "model.layers.10.attention.wqkv.weight", "model.layers.10.attention.wo.weight", "model.layers.10.feed_forward.w1.weight", "model.layers.10.feed_forward.w3.weight", "model.layers.10.feed_forward.w2.weight", "model.layers.10.ffn_norm.weight", "model.layers.10.attention_norm.weight", "model.layers.11.attention.wqkv.weight", "model.layers.11.attention.wo.weight", "model.layers.11.feed_forward.w1.weight", "model.layers.11.feed_forward.w3.weight", "model.layers.11.feed_forward.w2.weight", "model.layers.11.ffn_norm.weight", "model.layers.11.attention_norm.weight", "model.layers.12.attention.wqkv.weight", "model.layers.12.attention.wo.weight", "model.layers.12.feed_forward.w1.weight", "model.layers.12.feed_forward.w3.weight", "model.layers.12.feed_forward.w2.weight", "model.layers.12.ffn_norm.weight", "model.layers.12.attention_norm.weight", "model.layers.13.attention.wqkv.weight", "model.layers.13.attention.wo.weight", "model.layers.13.feed_forward.w1.weight", "model.layers.13.feed_forward.w3.weight", "model.layers.13.feed_forward.w2.weight", "model.layers.13.ffn_norm.weight", "model.layers.13.attention_norm.weight", "model.layers.14.attention.wqkv.weight", "model.layers.14.attention.wo.weight", "model.layers.14.feed_forward.w1.weight", "model.layers.14.feed_forward.w3.weight", "model.layers.14.feed_forward.w2.weight", "model.layers.14.ffn_norm.weight", "model.layers.14.attention_norm.weight", "model.layers.15.attention.wqkv.weight", "model.layers.15.attention.wo.weight", "model.layers.15.feed_forward.w1.weight", "model.layers.15.feed_forward.w3.weight", "model.layers.15.feed_forward.w2.weight", "model.layers.15.ffn_norm.weight", "model.layers.15.attention_norm.weight", "model.layers.16.attention.wqkv.weight", "model.layers.16.attention.wo.weight", "model.layers.16.feed_forward.w1.weight", "model.layers.16.feed_forward.w3.weight", "model.layers.16.feed_forward.w2.weight", "model.layers.16.ffn_norm.weight", "model.layers.16.attention_norm.weight", "model.layers.17.attention.wqkv.weight", "model.layers.17.attention.wo.weight", "model.layers.17.feed_forward.w1.weight", "model.layers.17.feed_forward.w3.weight", "model.layers.17.feed_forward.w2.weight", "model.layers.17.ffn_norm.weight", "model.layers.17.attention_norm.weight", "model.layers.18.attention.wqkv.weight", "model.layers.18.attention.wo.weight", "model.layers.18.feed_forward.w1.weight", "model.layers.18.feed_forward.w3.weight", "model.layers.18.feed_forward.w2.weight", "model.layers.18.ffn_norm.weight", "model.layers.18.attention_norm.weight", "model.layers.19.attention.wqkv.weight", "model.layers.19.attention.wo.weight", "model.layers.19.feed_forward.w1.weight", "model.layers.19.feed_forward.w3.weight", "model.layers.19.feed_forward.w2.weight", "model.layers.19.ffn_norm.weight", "model.layers.19.attention_norm.weight", "model.layers.20.attention.wqkv.weight", "model.layers.20.attention.wo.weight", "model.layers.20.feed_forward.w1.weight", "model.layers.20.feed_forward.w3.weight", "model.layers.20.feed_forward.w2.weight", "model.layers.20.ffn_norm.weight", "model.layers.20.attention_norm.weight", "model.layers.21.attention.wqkv.weight", "model.layers.21.attention.wo.weight", "model.layers.21.feed_forward.w1.weight", "model.layers.21.feed_forward.w3.weight", "model.layers.21.feed_forward.w2.weight", "model.layers.21.ffn_norm.weight", "model.layers.21.attention_norm.weight", "model.layers.22.attention.wqkv.weight", "model.layers.22.attention.wo.weight", "model.layers.22.feed_forward.w1.weight", "model.layers.22.feed_forward.w3.weight", "model.layers.22.feed_forward.w2.weight", "model.layers.22.ffn_norm.weight", "model.layers.22.attention_norm.weight", "model.layers.23.attention.wqkv.weight", "model.layers.23.attention.wo.weight", "model.layers.23.feed_forward.w1.weight", "model.layers.23.feed_forward.w3.weight", "model.layers.23.feed_forward.w2.weight", "model.layers.23.ffn_norm.weight", "model.layers.23.attention_norm.weight", "model.norm.weight", "model.output.weight", "model.fast_embeddings.weight", "model.fast_layers.0.attention.wqkv.weight", "model.fast_layers.0.attention.wo.weight", "model.fast_layers.0.feed_forward.w1.weight", "model.fast_layers.0.feed_forward.w3.weight", "model.fast_layers.0.feed_forward.w2.weight", "model.fast_layers.0.ffn_norm.weight", "model.fast_layers.0.attention_norm.weight", "model.fast_layers.1.attention.wqkv.weight", "model.fast_layers.1.attention.wo.weight", "model.fast_layers.1.feed_forward.w1.weight", "model.fast_layers.1.feed_forward.w3.weight", "model.fast_layers.1.feed_forward.w2.weight", "model.fast_layers.1.ffn_norm.weight", "model.fast_layers.1.attention_norm.weight", "model.fast_layers.2.attention.wqkv.weight", "model.fast_layers.2.attention.wo.weight", "model.fast_layers.2.feed_forward.w1.weight", "model.fast_layers.2.feed_forward.w3.weight", "model.fast_layers.2.feed_forward.w2.weight", "model.fast_layers.2.ffn_norm.weight", "model.fast_layers.2.attention_norm.weight", "model.fast_layers.3.attention.wqkv.weight", "model.fast_layers.3.attention.wo.weight", "model.fast_layers.3.feed_forward.w1.weight", "model.fast_layers.3.feed_forward.w3.weight", "model.fast_layers.3.feed_forward.w2.weight", "model.fast_layers.3.ffn_norm.weight", "model.fast_layers.3.attention_norm.weight", "model.fast_norm.weight", "model.fast_output.weight". 
[2024-09-21 05:55:56,701][fish_speech.utils.utils][INFO] - [rank: 0] Output dir: results/
Error executing job with overrides: ['project=', '[email protected]_config=r_8_alpha_16']
Traceback (most recent call last):
  File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/train.py", line 137, in main
    train(cfg)
  File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/utils/utils.py", line 77, in wrap
    raise ex
  File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/utils/utils.py", line 66, in wrap
    metric_dict, object_dict = task_func(cfg=cfg)
  File "/content/drive/.shortcut-targets-by-id/1gv6Ipb1SoE2CUkCBANgPmnyXOiKV_Q2L/fish-speech-1.4.1/fish_speech/train.py", line 110, in train
    trainer.fit(model=model, datamodule=datamodule, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 538, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/call.py", line 46, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
    return function(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/trainer.py", line 950, in _run
    self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 398, in _restore_modules_and_callbacks
    self.restore_model()
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 275, in restore_model
    self.trainer.strategy.load_model_state_dict(
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/strategies/strategy.py", line 371, in load_model_state_dict
    self.lightning_module.load_state_dict(checkpoint["state_dict"], strict=strict)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for TextToSemantic:
	Missing key(s) in state_dict: "model.embeddings.weight", "model.codebook_embeddings.weight", "model.layers.0.attention.wqkv.weight", "model.layers.0.attention.wo.weight", "model.layers.0.feed_forward.w1.weight", "model.layers.0.feed_forward.w3.weight", "model.layers.0.feed_forward.w2.weight", "model.layers.0.ffn_norm.weight", "model.layers.0.attention_norm.weight", "model.layers.1.attention.wqkv.weight", "model.layers.1.attention.wo.weight", "model.layers.1.feed_forward.w1.weight", "model.layers.1.feed_forward.w3.weight", "model.layers.1.feed_forward.w2.weight", "model.layers.1.ffn_norm.weight", "model.layers.1.attention_norm.weight", "model.layers.2.attention.wqkv.weight", "model.layers.2.attention.wo.weight", "model.layers.2.feed_forward.w1.weight", "model.layers.2.feed_forward.w3.weight", "model.layers.2.feed_forward.w2.weight", "model.layers.2.ffn_norm.weight", "model.layers.2.attention_norm.weight", "model.layers.3.attention.wqkv.weight", "model.layers.3.attention.wo.weight", "model.layers.3.feed_forward.w1.weight", "model.layers.3.feed_forward.w3.weight", "model.layers.3.feed_forward.w2.weight", "model.layers.3.ffn_norm.weight", "model.layers.3.attention_norm.weight", "model.layers.4.attention.wqkv.weight", "model.layers.4.attention.wo.weight", "model.layers.4.feed_forward.w1.weight", "model.layers.4.feed_forward.w3.weight", "model.layers.4.feed_forward.w2.weight", "model.layers.4.ffn_norm.weight", "model.layers.4.attention_norm.weight", "model.layers.5.attention.wqkv.weight", "model.layers.5.attention.wo.weight", "model.layers.5.feed_forward.w1.weight", "model.layers.5.feed_forward.w3.weight", "model.layers.5.feed_forward.w2.weight", "model.layers.5.ffn_norm.weight", "model.layers.5.attention_norm.weight", "model.layers.6.attention.wqkv.weight", "model.layers.6.attention.wo.weight", "model.layers.6.feed_forward.w1.weight", "model.layers.6.feed_forward.w3.weight", "model.layers.6.feed_forward.w2.weight", "model.layers.6.ffn_norm.weight", "model.layers.6.attention_norm.weight", "model.layers.7.attention.wqkv.weight", "model.layers.7.attention.wo.weight", "model.layers.7.feed_forward.w1.weight", "model.layers.7.feed_forward.w3.weight", "model.layers.7.feed_forward.w2.weight", "model.layers.7.ffn_norm.weight", "model.layers.7.attention_norm.weight", "model.layers.8.attention.wqkv.weight", "model.layers.8.attention.wo.weight", "model.layers.8.feed_forward.w1.weight", "model.layers.8.feed_forward.w3.weight", "model.layers.8.feed_forward.w2.weight", "model.layers.8.ffn_norm.weight", "model.layers.8.attention_norm.weight", "model.layers.9.attention.wqkv.weight", "model.layers.9.attention.wo.weight", "model.layers.9.feed_forward.w1.weight", "model.layers.9.feed_forward.w3.weight", "model.layers.9.feed_forward.w2.weight", "model.layers.9.ffn_norm.weight", "model.layers.9.attention_norm.weight", "model.layers.10.attention.wqkv.weight", "model.layers.10.attention.wo.weight", "model.layers.10.feed_forward.w1.weight", "model.layers.10.feed_forward.w3.weight", "model.layers.10.feed_forward.w2.weight", "model.layers.10.ffn_norm.weight", "model.layers.10.attention_norm.weight", "model.layers.11.attention.wqkv.weight", "model.layers.11.attention.wo.weight", "model.layers.11.feed_forward.w1.weight", "model.layers.11.feed_forward.w3.weight", "model.layers.11.feed_forward.w2.weight", "model.layers.11.ffn_norm.weight", "model.layers.11.attention_norm.weight", "model.layers.12.attention.wqkv.weight", "model.layers.12.attention.wo.weight", "model.layers.12.feed_forward.w1.weight", "model.layers.12.feed_forward.w3.weight", "model.layers.12.feed_forward.w2.weight", "model.layers.12.ffn_norm.weight", "model.layers.12.attention_norm.weight", "model.layers.13.attention.wqkv.weight", "model.layers.13.attention.wo.weight", "model.layers.13.feed_forward.w1.weight", "model.layers.13.feed_forward.w3.weight", "model.layers.13.feed_forward.w2.weight", "model.layers.13.ffn_norm.weight", "model.layers.13.attention_norm.weight", "model.layers.14.attention.wqkv.weight", "model.layers.14.attention.wo.weight", "model.layers.14.feed_forward.w1.weight", "model.layers.14.feed_forward.w3.weight", "model.layers.14.feed_forward.w2.weight", "model.layers.14.ffn_norm.weight", "model.layers.14.attention_norm.weight", "model.layers.15.attention.wqkv.weight", "model.layers.15.attention.wo.weight", "model.layers.15.feed_forward.w1.weight", "model.layers.15.feed_forward.w3.weight", "model.layers.15.feed_forward.w2.weight", "model.layers.15.ffn_norm.weight", "model.layers.15.attention_norm.weight", "model.layers.16.attention.wqkv.weight", "model.layers.16.attention.wo.weight", "model.layers.16.feed_forward.w1.weight", "model.layers.16.feed_forward.w3.weight", "model.layers.16.feed_forward.w2.weight", "model.layers.16.ffn_norm.weight", "model.layers.16.attention_norm.weight", "model.layers.17.attention.wqkv.weight", "model.layers.17.attention.wo.weight", "model.layers.17.feed_forward.w1.weight", "model.layers.17.feed_forward.w3.weight", "model.layers.17.feed_forward.w2.weight", "model.layers.17.ffn_norm.weight", "model.layers.17.attention_norm.weight", "model.layers.18.attention.wqkv.weight", "model.layers.18.attention.wo.weight", "model.layers.18.feed_forward.w1.weight", "model.layers.18.feed_forward.w3.weight", "model.layers.18.feed_forward.w2.weight", "model.layers.18.ffn_norm.weight", "model.layers.18.attention_norm.weight", "model.layers.19.attention.wqkv.weight", "model.layers.19.attention.wo.weight", "model.layers.19.feed_forward.w1.weight", "model.layers.19.feed_forward.w3.weight", "model.layers.19.feed_forward.w2.weight", "model.layers.19.ffn_norm.weight", "model.layers.19.attention_norm.weight", "model.layers.20.attention.wqkv.weight", "model.layers.20.attention.wo.weight", "model.layers.20.feed_forward.w1.weight", "model.layers.20.feed_forward.w3.weight", "model.layers.20.feed_forward.w2.weight", "model.layers.20.ffn_norm.weight", "model.layers.20.attention_norm.weight", "model.layers.21.attention.wqkv.weight", "model.layers.21.attention.wo.weight", "model.layers.21.feed_forward.w1.weight", "model.layers.21.feed_forward.w3.weight", "model.layers.21.feed_forward.w2.weight", "model.layers.21.ffn_norm.weight", "model.layers.21.attention_norm.weight", "model.layers.22.attention.wqkv.weight", "model.layers.22.attention.wo.weight", "model.layers.22.feed_forward.w1.weight", "model.layers.22.feed_forward.w3.weight", "model.layers.22.feed_forward.w2.weight", "model.layers.22.ffn_norm.weight", "model.layers.22.attention_norm.weight", "model.layers.23.attention.wqkv.weight", "model.layers.23.attention.wo.weight", "model.layers.23.feed_forward.w1.weight", "model.layers.23.feed_forward.w3.weight", "model.layers.23.feed_forward.w2.weight", "model.layers.23.ffn_norm.weight", "model.layers.23.attention_norm.weight", "model.norm.weight", "model.output.weight", "model.fast_embeddings.weight", "model.fast_layers.0.attention.wqkv.weight", "model.fast_layers.0.attention.wo.weight", "model.fast_layers.0.feed_forward.w1.weight", "model.fast_layers.0.feed_forward.w3.weight", "model.fast_layers.0.feed_forward.w2.weight", "model.fast_layers.0.ffn_norm.weight", "model.fast_layers.0.attention_norm.weight", "model.fast_layers.1.attention.wqkv.weight", "model.fast_layers.1.attention.wo.weight", "model.fast_layers.1.feed_forward.w1.weight", "model.fast_layers.1.feed_forward.w3.weight", "model.fast_layers.1.feed_forward.w2.weight", "model.fast_layers.1.ffn_norm.weight", "model.fast_layers.1.attention_norm.weight", "model.fast_layers.2.attention.wqkv.weight", "model.fast_layers.2.attention.wo.weight", "model.fast_layers.2.feed_forward.w1.weight", "model.fast_layers.2.feed_forward.w3.weight", "model.fast_layers.2.feed_forward.w2.weight", "model.fast_layers.2.ffn_norm.weight", "model.fast_layers.2.attention_norm.weight", "model.fast_layers.3.attention.wqkv.weight", "model.fast_layers.3.attention.wo.weight", "model.fast_layers.3.feed_forward.w1.weight", "model.fast_layers.3.feed_forward.w3.weight", "model.fast_layers.3.feed_forward.w2.weight", "model.fast_layers.3.ffn_norm.weight", "model.fast_layers.3.attention_norm.weight", "model.fast_norm.weight", "model.fast_output.weight". 

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
```</div>
@Stardust-minus
Copy link
Member

Add ckpt_path in config.yml

@med1844
Copy link
Contributor

med1844 commented Jan 28, 2025

tree | grep config.yml yields no result, i.e. there's no such file.

@abhisirka2001
Copy link

I am facing the same issue
can somebody help?
I have passed the merged model directory in config.yaml ,pretrained_ckpt_path: /workspace/FISHSPEECH/lora_weights.
@Stardust-minus @HimanshuRepozitory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants