Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebalancing of the categories in the training data #4

Open
strayMat opened this issue Feb 12, 2018 · 0 comments
Open

rebalancing of the categories in the training data #4

strayMat opened this issue Feb 12, 2018 · 0 comments

Comments

@strayMat
Copy link
Collaborator

The method that I heared of belongs to the category of 'sampling methods that manipulate the training data to change the class distribution.' A successful example is the SMOTE method (Chawla et al., 2002). The general idea is to artificially generate new examples of the minority class using the nearest neighbors of these cases. Furthermore, the majority class examples are also under-sampled, leading to a more balanced dataset.

This is a data augmentation method that we could make a good deal of to augment the size of our dataset.
PB: It is not trivial to learn to generate plausible sentences from under-sampled categories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant