Skip to content

Synthetic case of study on the state-of-the-art samplers for imbalanced learning

Notifications You must be signed in to change notification settings

sergiogvz/imbalanced_synthetic_data_plots

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synthetic case study on the state-of-the-art samplers for imbalanced learning

This repository provides the code necessary to design an graphical analysis of the best sampling methods for imbalanced classification. With it, a binary synthetic data set with a chess board distribution can be constructed with the number of instances and the imbalanced ratio desired, tuning the parameters of the program. This dataset is preprocessed by the most relevant methods published in Python and CRAN of R. The results are plotted together with the classification surfaces inferred by the Scikit-Learn's decision tree.

The repository contains the following files:

  • plot_synthetic.py generates the synthetic data and executes all the sampling methods of the imblearn package. Its parameters goes as followed:

    • 1st parameter (div): shape of the chess board.
    • 2nd parameter (N): number of instances for the balanced dataset (N/2 for each class).
    • 3rd parameter (per): percentage of instances that conform the imbalanced data set (value in [0,1]).
  • plot_syntheticMWMOTE.py does the same as plot_synthetic.py, but with the MWMOTE method.

  • MWMOTE.py implements the MWMOTE method provided in its GitHub repo.

  • smotesData.R executes other important over-sampling methods implemented in the smotefamily package of R and ROSE

  • plot_file.py plots the results obtained with smotesData.R, giving the generated files as parameter.

Included Methods and some examples

Starting from the a 4x4 Chess data with 1000 instances and 10% of the minority class (div=5; N=1000; per=0.1): DSoriginal DSimbalanced

Over-sampling methods

Under-samping methods

About

Synthetic case of study on the state-of-the-art samplers for imbalanced learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published