[Feature]: Add audio out of dataset to audio section in TensorBoard #878

BornSaint · 2024-11-19T19:15:49Z

Description

When training, the script choose one audio from dataset to be on tensorboard each epoch, but using an audio with same features than the model trained make it hard to see if the training is well enough. I still can see by loss graphic if it's starting to overfit, but hearing the audio could help when can't train for many time and the quality is already acceptable and stop training.

Problem

already in description

Proposed Solution

add an option for cli script to pick an audio, something like, --tensorboard-audio "/path/to/audio/file" and for GUI could just add a gradio element to pick audio.

Alternatives Considered

not exactly an alternative, but would be awesome an auto-stop training when values don't change in a range, like, --auto-stop 10
would stop if model don't get better when finish next 10 epochs, or if get better, reset the count.

BornSaint · 2024-11-19T19:45:02Z

***my alternative is actually already implemented

BornSaint · 2024-11-19T20:25:25Z

i guess this commit changes random tensorboard audio to first audio from dataset for evaluation, but it still compromise the reference, like i said my comment in this commit page

the first sample is not used on training? same audio on training and eval could compromise the reference for people training the model, e.g. me.
Wouldn't be better if add an option to select external audio for tensorboard instead picking from dataset?

Better alternative is to exclude first sample of training loader and set it exclusively for evaluation**

BornSaint · 2024-11-19T21:40:16Z

find out these comments in rvc/train/train.py

441 # get the first sample as reference for tensorboard evaluation
442 # custom reference temporarily disabled

i would have any issue enabling it in Applio 3.2.7?

AznamirWoW · 2024-11-19T21:53:03Z

find out these comments in rvc/train/train.py

441 # get the first sample as reference for tensorboard evaluation
442 # custom reference temporarily disabled
i would have any issue enabling it in Applio 3.2.7?

How to create your own reference:

prepare a .wav file, no longer than 5 seconds
use training tab to create a new model at desired sampling rate, lets say 32000

in preprocess uncheck audio cutting and process audio
run preprocess, run feature extraction

move the files to reference folder, rename as listed

.wav file from sliced audios, rename to ref32000.wav
.wav.npy file from f0 folder, rename to ref32000_f0c.wav
.wav.npy file from f0_voiced folder, rename to ref32000_f0f.npy
.npy file from v2_extracted folder, rename to ref32000_feats.npy
these file should replace what was provided in /logs/reference with 3.2.7 release

remove True == False and from the train.py code

BornSaint · 2024-11-19T22:03:45Z

Many thanks, love it! You can close it if you wish.

AirJCovers34 · 2024-11-21T10:34:03Z

find out these comments in rvc/train/train.py

441 # get the first sample as reference for tensorboard evaluation
442 # custom reference temporarily disabled
i would have any issue enabling it in Applio 3.2.7?

How to create your own reference:

prepare a .wav file, no longer than 5 seconds

use training tab to create a new model at desired sampling rate, lets say 32000

in preprocess uncheck audio cutting and process audio

run preprocess, run feature extraction

move the files to reference folder, rename as listed

.wav file from sliced audios, rename to ref32000.wav

.wav.npy file from f0 folder, rename to ref32000_f0c.wav

.wav.npy file from f0_voiced folder, rename to ref32000_f0f.npy

.npy file from v2_extracted folder, rename to ref32000_feats.npy
these file should replace what was provided in /logs/reference with 3.2.7 release

remove True == False and from the train.py code

That's exactly what I was trying to do.
But when starting the training, I get this error:

Running on local URL:  http://127.0.0.1:6927

To create a public link, set `share=True` in `launch()`.
Starting preprocess with 8 processes...
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.60s/it]
Preprocess completed in 5.61 seconds on 00:00:04 seconds of audio.
Starting pitch extraction with 8 cores on cuda:0 using rmvpe...
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.38s/it]
Pitch extraction completed in 7.17 seconds.
Starting embedding extraction with 8 cores on cuda:0...
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.81it/s]
Embedding extraction completed in 6.87 seconds.
Starting preprocess with 8 processes...
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:34<00:00, 34.56s/it]
Preprocess completed in 34.56 seconds on 00:34:48 seconds of audio.
Starting pitch extraction with 8 cores on cuda:0 using rmvpe...
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]An error occurred extracting file C:\ApplioV327\logs\Test_BensonBoone\sliced_audios_16k\0_0_0.wav on cuda:0: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:16<00:00, 16.20s/it]
Pitch extraction completed in 21.78 seconds.
Starting embedding extraction with 8 cores on cuda:0...
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:35<00:00, 35.82s/it]
Embedding extraction completed in 41.39 seconds.
Starting training...
Loaded pretrained (G) 'rvc\models\pretraineds\pretraineds_custom\G-f048k-TITAN-Medium.pth'
Loaded pretrained (D) 'rvc\models\pretraineds\pretraineds_custom\D-f048k-TITAN-Medium.pth'
Process Process-1:
Traceback (most recent call last):
  File "C:\ApplioV327\env\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "C:\ApplioV327\env\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\ApplioV327\rvc\train\train.py", line 482, in run
    train_and_evaluate(
  File "C:\ApplioV327\rvc\train\train.py", line 680, in train_and_evaluate
    if loss_mel > 75:
UnboundLocalError: local variable 'loss_mel' referenced before assignment
Saved index file 'C:\ApplioV327\logs\Test_BensonBoone\added_Test_BensonBoone_v2.index'

Any idea what I might be doing wrong? 🤔

AznamirWoW · 2024-11-21T12:39:09Z

Any idea what I might be doing wrong? 🤔

Dont train on those small references. Use wav, two f0 files and feature file as references instead.

AirJCovers34 · 2024-11-21T13:49:52Z

Dont train on those small references. Use wav, two f0 files and feature file as references instead.

Could you elaborate, please?

AznamirWoW · 2024-11-21T14:27:03Z

Dont train on those small references. Use wav, two f0 files and feature file as references instead.

Could you elaborate, please?

to make reference files you just need to do preprocess and extract features and use the files generated from those to replace references in logs/reference folder

AirJCovers34 · 2024-11-21T17:36:53Z

to make reference files you just need to do preprocess and extract features and use the files generated from those to replace references in logs/reference folder

That's exactly what I did. But it seems the error lies now at another level... 😥

AznamirWoW · 2024-11-21T17:42:58Z

Hmm... okay, I kinda expected that. There's some alignment between pitch and phoneme tensors that needs to be made and it is quite annoying for random sample sizes

AirJCovers34 · 2024-11-21T17:45:34Z

Hmm... okay, I kinda expected that. There's some alignment between pitch and phoneme tensors that needs to be made and it is quite annoying for random sample sizes

Is it possible to fix this issue? Or should I accept that training won't be possible with version 3.2.7?

AznamirWoW · 2024-11-21T17:55:12Z

You can disable the custom reference and fall back to the original 3.2.6 method of picking a random sample from the training set. Or you can try making a different size of reference audio.

What I had included with 3.2.7 was this

G:\ApplioV3.2.7\logs\reference>python
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import soundfile as sf
import librosa
import numpy as np
audio, sr = librosa.load(r"G:\ApplioV3.2.7\logs\reference\ref48000.wav", sr=48000)
print(audio.shape)
(147122,)
f0c = np.load(r"G:\ApplioV3.2.7\logs\reference\ref48000_f0c.npy")
f0f = np.load(r"G:\ApplioV3.2.7\logs\reference\ref48000_f0f.npy")
feats = np.load(r"G:\ApplioV3.2.7\logs\reference\ref48000_feats.npy")
print(f0c.shape)
(307,)
print(f0f.shape)
(307,)
print(feats.shape)
(153, 768)

feature gets expanded 2x (153 -> 306)
pitch gets the last dimentsion trimmed (307->306)

so they match each other in size.

AirJCovers34 · 2024-11-21T18:46:36Z

You can disable the custom reference and fall back to the original 3.2.6 method of picking a random sample from the training set. Or you can try making a different size of reference audio.

What I had included with 3.2.7 was this

G:\ApplioV3.2.7\logs\reference>python Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.

import soundfile as sf
import librosa
import numpy as np
audio, sr = librosa.load(r"G:\ApplioV3.2.7\logs\reference\ref48000.wav", sr=48000)
print(audio.shape)
(147122,)
f0c = np.load(r"G:\ApplioV3.2.7\logs\reference\ref48000_f0c.npy")
f0f = np.load(r"G:\ApplioV3.2.7\logs\reference\ref48000_f0f.npy")
feats = np.load(r"G:\ApplioV3.2.7\logs\reference\ref48000_feats.npy")
print(f0c.shape)
(307,)
print(f0f.shape)
(307,)
print(feats.shape)
(153, 768)

feature gets expanded 2x (153 -> 306) pitch gets the last dimentsion trimmed (307->306)

so they match each other in size.

On my side, I get this:

C:\ApplioV327\logs\reference>python
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import soundfile as sf
>>> import librosa
>>> import numpy as np
>>> audio, sr = librosa.load(r"C:\ApplioV327\logs\reference\ref48000.wav", sr=48000)
>>> print(audio.shape)
(100258259,)
>>> (147122,)
(147122,)
>>> f0c = np.load(r"C:\ApplioV327\logs\reference\ref48000_f0c.npy")
>>> f0f = np.load(r"C:\ApplioV327\logs\reference\ref48000_f0f.npy")
>>> feats = np.load(r"C:\ApplioV327\logs\reference\ref48000_feats.npy")
>>> print(f0c.shape)
(401,)
>>> (307,)
(307,)
>>> print(f0f.shape)
(401,)
>>> (307,)
(307,)
>>> print(feats.shape)
(199, 768)
>>> (153, 768)

AznamirWoW · 2024-11-21T19:41:20Z

Why your reference wav is so big? (100258259,) - that's 30 minutes+

I said use a 5-10 sec sample at most.

AirJCovers34 · 2024-11-21T20:09:47Z

Why your reference wav is so big? (100258259,) - that's 30 minutes+

I said use a 5-10 sec sample at most.

File error when replacing.. 😉😂
It's better now.

C:\ApplioV327\logs\reference>python
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import soundfile as sf
>>> import librosa
>>> import numpy as np
>>> audio, sr = librosa.load(r"C:\ApplioV327\logs\reference\ref48000.wav", sr=48000)
>>> print(audio.shape)
(192001,)
>>> (147122,)
(147122,)
>>> f0c = np.load(r"C:\ApplioV327\logs\reference\ref48000_f0c.npy")
>>> f0f = np.load(r"C:\ApplioV327\logs\reference\ref48000_f0f.npy")
>>> feats = np.load(r"C:\ApplioV327\logs\reference\ref48000_feats.npy")
>>> print(f0c.shape)
(401,)
>>> (307,)
(307,)
>>> print(f0f.shape)
(401,)
>>> (307,)
(307,)
>>> print(feats.shape)
(199, 768)
>>> (153, 768)
(153, 768)
>>>

BornSaint added enhancement New feature or request feature labels Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add audio out of dataset to audio section in TensorBoard #878

[Feature]: Add audio out of dataset to audio section in TensorBoard #878

BornSaint commented Nov 19, 2024

BornSaint commented Nov 19, 2024

BornSaint commented Nov 19, 2024 •

edited

Loading

BornSaint commented Nov 19, 2024

AznamirWoW commented Nov 19, 2024

BornSaint commented Nov 19, 2024

AirJCovers34 commented Nov 21, 2024 •

edited

Loading

AznamirWoW commented Nov 21, 2024

AirJCovers34 commented Nov 21, 2024

AznamirWoW commented Nov 21, 2024

AirJCovers34 commented Nov 21, 2024 •

edited

Loading

AznamirWoW commented Nov 21, 2024

AirJCovers34 commented Nov 21, 2024

AznamirWoW commented Nov 21, 2024

AirJCovers34 commented Nov 21, 2024

AznamirWoW commented Nov 21, 2024

AirJCovers34 commented Nov 21, 2024

[Feature]: Add audio out of dataset to audio section in TensorBoard #878

[Feature]: Add audio out of dataset to audio section in TensorBoard #878

Comments

BornSaint commented Nov 19, 2024

Description

Problem

Proposed Solution

Alternatives Considered

BornSaint commented Nov 19, 2024

BornSaint commented Nov 19, 2024 • edited Loading

Better alternative is to exclude first sample of training loader and set it exclusively for evaluation**

BornSaint commented Nov 19, 2024

AznamirWoW commented Nov 19, 2024

BornSaint commented Nov 19, 2024

AirJCovers34 commented Nov 21, 2024 • edited Loading

AznamirWoW commented Nov 21, 2024

AirJCovers34 commented Nov 21, 2024

AznamirWoW commented Nov 21, 2024

AirJCovers34 commented Nov 21, 2024 • edited Loading

AznamirWoW commented Nov 21, 2024

AirJCovers34 commented Nov 21, 2024

AznamirWoW commented Nov 21, 2024

AirJCovers34 commented Nov 21, 2024

AznamirWoW commented Nov 21, 2024

AirJCovers34 commented Nov 21, 2024

BornSaint commented Nov 19, 2024 •

edited

Loading

AirJCovers34 commented Nov 21, 2024 •

edited

Loading

AirJCovers34 commented Nov 21, 2024 •

edited

Loading