Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error message: redefine phi relevance functions #12

Open
ivan-marroquin opened this issue Sep 1, 2020 · 8 comments
Open

Error message: redefine phi relevance functions #12

ivan-marroquin opened this issue Sep 1, 2020 · 8 comments

Comments

@ivan-marroquin
Copy link

Hi Nick,

Many thanks for making this package available!

With my data set and following the code example for the intermediate exercise, I bumped into this error message: redefine phi relevance function: all points are 0

Checking the source code, I noticed that there is a safeguard:

if all(i == 1 for i in y_phi):
raise ValueError("redefine phi relevance function: all points are 0")

but I could not further understand how this links to my data. I am using Python 3.6.5 on a windows machine and smogn 0.1.2

I attached a copy of the script and input data.

Thanks for your help,

Ivan

Testing_SMOGN_package.zip

@ivan-marroquin
Copy link
Author

any comments?

@nickkunz
Copy link
Owner

Hello,

Thank you for using SMOGN. It appears that your data does not contain outliers in order to automatically generate regions of over-sampling. Please advise.

@ivan-marroquin
Copy link
Author

Hi @nickkunz

thanks for looking into this issue. The background data consist of zeros, while the outliers are values higher than 0.50 (see attached plot)

Hope this helps,

Ivan

input_data_smogn

@ivan-marroquin
Copy link
Author

Hi @nickkunz

Hoping that you are doing well. I was wondering if you had the chance to look into this issue?

Kind regards,
Ivan

@Bahar1978
Copy link

Hello, thanks for SMOGN. Unfortunately I have the same issue. Could you please guide us how should we solve it.

@mvirag2000
Copy link

Hi Nick,

I am also getting this error, and I have a theory. My data is very skewed: insurance data where 95% of claims are zero. I'd like SMOGN to oversample the other 5% but, I think, there are so many zero values that it doesn't identify the others as outliers. This theory is consistent with Ivan's situation. I hope this helps!

Best,
Mark

@rkrishna116
Copy link

rkrishna116 commented Feb 2, 2021

Hi @ivan-marroquin, I came across the same error.

And until the dev fixes this, there's a work around you can implement.

Assuming that you work locally, go to the location where the package is installed.

For me it was "C:\Users\user_name\Anaconda3\envs\project_3\Lib\site-packages\smogn"

Open smoter.py and comment out the following lines:

if all(i == 0 for i in y_phi):
        #raise ValueError("redefine phi relevance function: all points are 1")
    if all(i == 1 for i in y_phi):
        #raise ValueError("redefine phi relevance function: all points are 0").

Then restart the kernel, import the smogn and this issue should be fixed.

@ivan-marroquin
Copy link
Author

Hi @rkrishna116

Thanks for the workaround! I will give a try.

I found another approach to solve the need of minority values in continuous data, and it is "data discretization". Here is a link to find more about https://www.includehelp.com/basics/data-discretization-in-data-mining.aspx

There are plenty of statistical approaches that can be used to estimate the optimal number of bins to discretize your continuous data. Good luck!

Ivan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants