Resampling with label uniformity and user uniformity #39

qa-aleksejs-fomins · 2022-12-01T10:24:53Z

Hi,

I have a regression problem, so the label is a single floating-point number within a well-defined range (e.g. [0, 1]). The label distribution is non-uniform: namely, there is markedly less data at the edges, but also in the very middle of the range. So far, a classical problem for SMOGN. However, I sample data from multiple users, and there is also a huge imbalance in amount of data among users. I would prefer that all users are well-represented in the training set in addition to balancing the label range distribution. Thus, I would prefer that the algorithm is aware of user labels, and tries to undersample users with a lot of data and preserve or oversample users with little data. Is this currently possible? Do you have suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resampling with label uniformity and user uniformity #39

Resampling with label uniformity and user uniformity #39

qa-aleksejs-fomins commented Dec 1, 2022

Resampling with label uniformity and user uniformity #39

Resampling with label uniformity and user uniformity #39

Comments

qa-aleksejs-fomins commented Dec 1, 2022