Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with augmenting a higher percentage of the dataset #13

Open
pokey opened this issue Oct 9, 2022 · 3 comments
Open

Experiment with augmenting a higher percentage of the dataset #13

pokey opened this issue Oct 9, 2022 · 3 comments

Comments

@pokey
Copy link
Contributor

pokey commented Oct 9, 2022

Today, it appears that augmentation is only occurring 10% of the time

if (self.training and random.uniform(0, 1) >= 0.9 ):
if (self.augmented_samples[idx] is None):
self.augmented_samples[idx] = [self.samples[idx][0], self.samples[idx][1], torch.tensor(self.feature_engineering_augmented(self.samples[idx][0])).float()]
return self.augmented_samples[idx][2], self.augmented_samples[idx][1]

@ym-han
Copy link
Contributor

ym-han commented Oct 10, 2022

I have a suggestion in a related vein. I think we can simplify the structure of the code here and do away with the random.uniform(0, 1) >= 0.9. This is because the thing that's actually doing the augmentation --- the thing that ends up being called in turn by self.feature_engineering_augmented --- is actually a probabilistic augmenter transform. That is, part of the code for augmented_feature_engineering (which is what gets called by self.feature_engineering_augmented) looks like this:

def augmented_feature_engineering( wavFile, settings ):
    fs, rawWav = scipy.io.wavfile.read( wavFile )
    wavData = rawWav
   # <some stuff that I haven't included>
    
    augmenter = Compose([
        AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
        TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
        Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
    ])
    wavData = augmenter(samples=np.array(wavData, dtype="float32"), sample_rate=fs)

The p formal parameter for transforms like AddGaussianNoise, TimeStretch and Shift is the probability that that transform will get applied (see, e.g., https://iver56.github.io/audiomentations/waveform_transforms/add_gaussian_noise/ and iver56/audiomentations#168). So, what is currently happening is that

  • the probability of no augmentation at all for an arbitrary training sample $=0.9 + 0.1 * (0.5)^3 = 0.9125$
  • Or, the probability of there being at least one augmentation for an arbitrary training sample = $1 - 0.9125 = 0.0875$.

(I hope I haven't got the math wrong --- please correct me if I did.)

This structure of this code can thus be simplified as follows. Let $q$ be the probability that no augmentation will be done; this will be a hyperparameter that we control. And let $t$ be the number of augmenter transforms we're using (in the code above, this is 3). Since the augmenter transforms already come with a formal probability parameter $p$, we do not need the equivalent of random.uniform(0, 1) >= 0.9. Instead, we can just set $p$ for the transforms based on the value we want for $q$ via $(1 - p)^{t} = q \iff p = 1 - \sqrt[t]{q}$, assuming we use the same $p$ for all the augmenter transforms. We can then treat $q$ as a hyperparameter that we can experiment with and tune (as per pokey's suggestion).

@pokey
Copy link
Contributor Author

pokey commented Oct 10, 2022

The only thing to keep in mind here is that for non-augmented datapoints, the feature values are cached, so I believe your approach might result in a performance penalty.

Tho tbh I'm guessing that doing more augmentation will outweigh the costs, but just a note

@ym-han
Copy link
Contributor

ym-han commented Oct 10, 2022

I'm not sure if this helps with making sure the caching works for the non-agumented data points, but we could also set p to 1 for the transforms, and adjust the amount of augmentation by keeping random.uniform(0, 1) >= some param.

My main thought is just that stacking a probabilistic thing on top of another probabilistic thing makes things harder to reason about --- it would be clearer if we either removed the random.uniform stuff or made the transforms deterministic by setting p = 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants