-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example for preprocessing.dictmapper.DictMapper
and meta.outlier_classifier.OutlierClassifier
#646
Changes from 5 commits
e237b68
7dbc8d2
5097b30
a6cec13
e22eeb2
bf4a91c
d9667ce
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,6 +43,41 @@ def fit(self, X, y=None): | |
------- | ||
self : DictMapper | ||
The fitted transformer. | ||
|
||
Example | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you manage to add how to make it interact with either There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this double dict was making me really uncomfortable 😅 I'm on it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did the fixes, but I'm unable to push them. I need to figure out what's going on and be back 😅 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I was also confused. However, I finally figured out what was happening and was able to push this time. The problem was that one of the files, sklego.model_selection.py, which I wasn’t even working on, failed to pass the ruff-format and was reformatted by ruff. Since I hadn’t worked on it, I decided to revert that change (git restore) and attempted to commit only the two files I had been working on. But after pushing, I received the message “Everything-up-to-date” and didn’t see any changes on my branch. Today, I decided to accept the formatting changes made by ruff, and I was finally able to push. I hope the formatter didn’t break anything in model_selection.py. What’s the proper way to handle this kind of problem in the future? |
||
------- | ||
```py | ||
import pandas as pd | ||
from sklego.preprocessing.dictmapper import DictMapper | ||
|
||
X = pd.DataFrame({ | ||
"city": ["Amsterdam", "Leiden", "Utrecht", "Amsterdam", "Haarlem"], | ||
"university": ["uva", "lei", "uu", "vu", "none"] | ||
}) | ||
|
||
mapper = { | ||
|
||
#population | ||
|
||
"Amsterdam": 1_181_817, | ||
"Leiden": 130_181, | ||
"Utrecht": 367_984, | ||
"Haarlem": 165_396, | ||
|
||
#ranking | ||
|
||
"uva": 64, | ||
"lei": 214, | ||
"uu": 117, | ||
"vu": 105 | ||
} | ||
|
||
dict_mapper = DictMapper(mapper, 0) | ||
_ = dict_mapper.fit(X) | ||
|
||
dict_mapper.n_features_in_ | ||
# 2 | ||
``` | ||
""" | ||
X = check_array( | ||
X, | ||
|
@@ -72,6 +107,47 @@ def transform(self, X): | |
------ | ||
ValueError | ||
If the number of columns from `X` differs from the number of columns when fitting. | ||
|
||
Example | ||
------- | ||
```py | ||
import pandas as pd | ||
from sklego.preprocessing.dictmapper import DictMapper | ||
|
||
X = pd.DataFrame({ | ||
"city": ["Amsterdam", "Leiden", "Utrecht", "Amsterdam", "Haarlem"], | ||
"university": ["uva", "lei", "uu", "vu", "none"] | ||
}) | ||
|
||
mapper = { | ||
|
||
#population | ||
|
||
"Amsterdam": 1_181_817, | ||
"Leiden": 130_181, | ||
"Utrecht": 367_984, | ||
"Haarlem": 165_396, | ||
|
||
#ranking | ||
|
||
"uva": 64, | ||
"lei": 214, | ||
"uu": 117, | ||
"vu": 105 | ||
} | ||
|
||
dict_mapper = DictMapper(mapper, 0) | ||
_ = dict_mapper.fit(X) | ||
|
||
X_trans = dict_mapper.transform(X) | ||
X_trans | ||
# array([[1181817, 64], | ||
# [ 130181, 214], | ||
# [ 367984, 117], | ||
# [1181817, 105], | ||
# [ 165396, 0]]) | ||
|
||
``` | ||
""" | ||
check_is_fitted(self, ["n_features_in_"]) | ||
X = check_array( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @anopsy, I have the same feedback as for
DictMapper
: if you could move the example up in the docstring I think it would be easier and faster for folks to find when scrolling through the api documentation without the need to step down into the.fit(..)
method.I think this example is ready to merge after that change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do that!