SMOGN is creating a new class for target #38

purp172 · 2022-11-28T14:18:30Z

Hey!
Any idea on why is the algorithm creating a new class (value) for my target? I'm analyzing the Room_Occupancy_Dataset from Kaggle, and in this dataset the target only has four values for occupancy (0, 1, 2, 3 people in the room), but it is expected for the model to be able to predict other cases that have more than 3 people in the room. SMOGN is not balancing the data correctly, because the majority class (0) remains equal, and the minority classes (1,2,3) are not over-sampled. Plus, it creates an extra value (4). I don't know if this is a bug, but i hope you can help me fix it. This is my 2d array:

rg_mtrx = [

    [0, 0, 0],  ## under-sample ("majority")
    [1, 1, 0],  ## over-sample ("minority")
    [2, 1, 0],  ## over-sample ("minority")
    [3, 1, 0],  ## over-sample ("minority")
]

## conduct smogn
balanced_smogn = smogn.smoter(
    
    ## main arguments
    data = df,            ## pandas dataframe
    y = 'Room_Occupancy_Count', ## string ('header name')
    k = 5,                    ## positive integer (k < n)
    pert = 0.02,              ## real number (0 < R < 1)
    samp_method = 'extreme',  ## string ('balance' or 'extreme')
    drop_na_col = False,       ## boolean (True or False)
    drop_na_row = False,       ## boolean (True or False)
    replace = True,          ## boolean (True or False)

    ## phi relevance arguments
    rel_thres = 0.50,         ## real number (0 < R < 1)
    rel_method = 'manual',    ## string ('auto' or 'manual')
    # rel_xtrm_type = 'both', ## unused (rel_method = 'manual')
    # rel_coef = 1.50,        ## unused (rel_method = 'manual')
    rel_ctrl_pts_rg = rg_mtrx ## 2d array (format: [x, y])
)

The text was updated successfully, but these errors were encountered:

nickkunz · 2022-11-29T19:08:00Z

Hello @Diogo-da-Silva-Rebelo, SMOGN was developed for regression. It seems like your problem is a classification one? If that is the case then SMOGN would note be useful. You may want to see if SMOTE is more appropriate. Thank you.

purp172 · 2022-11-29T21:29:01Z

Hello @Diogo-da-Silva-Rebelo, SMOGN was developed for regression. It seems like your problem is a classification one? If that is the case then SMOGN would note be useful. You may want to see if SMOTE is more appropriate. Thank you.

Hello, @nickkunz ! Thank you for responding. I don't think that's the case: I want to predict the number of people in the room, and not a specific class (not if the room has or not people inside). In fact, there's many values for the target and not only a restricted number. However, the target values must be integers, because we can't have 1.2 persons in the room :) Thus, it is a regression problem, when I said that the dataset only has four values, it does not mean that I can't have another values for instance in my test dataset. The algorithm is leaving all rows with the target = 0, even being that the value in majority. And it's not balancing, since the other values remain intact. What are you thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SMOGN is creating a new class for target #38

SMOGN is creating a new class for target #38

purp172 commented Nov 28, 2022 •

edited

Loading

nickkunz commented Nov 29, 2022

purp172 commented Nov 29, 2022

SMOGN is creating a new class for target #38

SMOGN is creating a new class for target #38

Comments

purp172 commented Nov 28, 2022 • edited Loading

nickkunz commented Nov 29, 2022

purp172 commented Nov 29, 2022

purp172 commented Nov 28, 2022 •

edited

Loading