-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error message: redefine phi relevance functions #12
Comments
any comments? |
Hello, Thank you for using SMOGN. It appears that your data does not contain outliers in order to automatically generate regions of over-sampling. Please advise. |
Hi @nickkunz thanks for looking into this issue. The background data consist of zeros, while the outliers are values higher than 0.50 (see attached plot) Hope this helps, Ivan |
Hi @nickkunz Hoping that you are doing well. I was wondering if you had the chance to look into this issue? Kind regards, |
Hello, thanks for SMOGN. Unfortunately I have the same issue. Could you please guide us how should we solve it. |
Hi Nick, I am also getting this error, and I have a theory. My data is very skewed: insurance data where 95% of claims are zero. I'd like SMOGN to oversample the other 5% but, I think, there are so many zero values that it doesn't identify the others as outliers. This theory is consistent with Ivan's situation. I hope this helps! Best, |
Hi @ivan-marroquin, I came across the same error. And until the dev fixes this, there's a work around you can implement. Assuming that you work locally, go to the location where the package is installed. For me it was "C:\Users\user_name\Anaconda3\envs\project_3\Lib\site-packages\smogn" Open smoter.py and comment out the following lines:
Then restart the kernel, import the smogn and this issue should be fixed. |
Hi @rkrishna116 Thanks for the workaround! I will give a try. I found another approach to solve the need of minority values in continuous data, and it is "data discretization". Here is a link to find more about https://www.includehelp.com/basics/data-discretization-in-data-mining.aspx There are plenty of statistical approaches that can be used to estimate the optimal number of bins to discretize your continuous data. Good luck! Ivan |
Hi Nick,
Many thanks for making this package available!
With my data set and following the code example for the intermediate exercise, I bumped into this error message: redefine phi relevance function: all points are 0
Checking the source code, I noticed that there is a safeguard:
if all(i == 1 for i in y_phi):
raise ValueError("redefine phi relevance function: all points are 0")
but I could not further understand how this links to my data. I am using Python 3.6.5 on a windows machine and smogn 0.1.2
I attached a copy of the script and input data.
Thanks for your help,
Ivan
Testing_SMOGN_package.zip
The text was updated successfully, but these errors were encountered: