Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docstrings API examples #648

Merged
merged 12 commits into from
Apr 10, 2024
24 changes: 24 additions & 0 deletions sklego/meta/outlier_classifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,30 @@ class OutlierClassifier(BaseEstimator, ClassifierMixin):
The fitted underlying outlier detection model.
classes_ : array-like of shape (2,)
Classes used for prediction (0 or 1)

Example
-------
```py
from sklearn.ensemble import IsolationForest
from sklego.meta.outlier_classifier import OutlierClassifier

X = [[0], [0.5], [-1], [99]]
y = [0, 0, 0, 1]

isolation_forest = IsolationForest()

outlier_clf = OutlierClassifier(isolation_forest)
_ = outlier_clf.fit(X, y)

preds = outlier_clf.predict([[100], [-0.5], [0.5], [1]])
# array[1. 0. 0. 0.]

proba_preds = outlier_clf.predict_proba([[100], [-0.5], [0.5], [1]])
# [[0.34946567 0.65053433]
# [0.79707913 0.20292087]
# [0.80275406 0.19724594]
# [0.80275406 0.19724594]]
```
"""

def __init__(self, model):
Expand Down
2 changes: 1 addition & 1 deletion sklego/model_selection.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ def KlusterFoldValidation(**kwargs):
class ClusterFoldValidation:
"""Cross validator that creates folds based on provided cluster method.
This ensures that data points in the same cluster are not split across different folds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did that survive lint CI/CD in the first place? 🙈

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ach so! Thank you for explaining, I also fell for this little test tube in the corner. Just to make sure, the "example" also allows for collapsing, isn't?
Waiting for your decision and will adjust it accordingly

Copy link
Collaborator

@FBruzzesi FBruzzesi Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for "example" it is just a matter of default (expanded vs collapsed).
Let's wait for Vincent feedback/preference to weight on this 🙃

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe go for usage instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought Usage keyword was google-style specific, but at least from the griffe docs it seems it is neither google or numpy.

At the end of the day I believe the only difference is that keywords get bold-ed (example in screenshot).

I think for now the safest way to proceed is to use Examples keyword as it would have the lowest impact and require less changes.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll do Examples then, if you change your mind, just ping me

!!! info "New in version 0.9.0"

Parameters
Expand Down
28 changes: 28 additions & 0 deletions sklego/preprocessing/dictmapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,34 @@ class DictMapper(TransformerMixin, BaseEstimator):
Number of features seen during `fit`.
dim_ : int
Deprecated, please use `n_features_in_` instead.

Example
-------
```py
import pandas as pd
from sklego.preprocessing.dictmapper import DictMapper
from sklearn.compose import ColumnTransformer

X = pd.DataFrame({
"city_pop": ["Amsterdam", "Leiden", "Utrecht", "None", "Haarlem"]
})

mapper = {
"Amsterdam": 1_181_817,
"Leiden": 130_181,
"Utrecht": 367_984,
"Haarlem": 165_396,
}

ct = ColumnTransformer([("dictmapper", DictMapper(mapper, 0), ["city_pop"])])
X_trans = ct.fit_transform(X)
X_trans
# array([[1181817],
# [ 130181],
# [ 367984],
# [ 0],
# [ 165396]])
```
"""

def __init__(self, mapper, default):
Expand Down
2 changes: 1 addition & 1 deletion sklego/preprocessing/outlier_remover.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class OutlierRemover(TrainOnlyTransformerMixin, BaseEstimator):

isolation_forest = IsolationForest()
isolation_forest.fit(X)
detector_preds = isolator_forest.predict(X)
detector_preds = isolation_forest.predict(X)

outlier_remover = OutlierRemover(isolation_forest, refit=True)
outlier_remover.fit(X)
Expand Down
31 changes: 29 additions & 2 deletions sklego/preprocessing/pandastransformers.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ class ColumnDropper(BaseEstimator, TransformerMixin):
feature_names_ : list[str]
The names of the features to keep during transform.

Examples
--------
Example
-------
```py
# Selecting a single column from a pandas DataFrame
import pandas as pd
Expand Down Expand Up @@ -183,6 +183,33 @@ class PandasTypeSelector(BaseEstimator, TransformerMixin):
!!! warning

Raises a `TypeError` if input provided is not a DataFrame.

Example
-------
```py
import pandas as pd
from sklego.preprocessing import PandasTypeSelector

df = pd.DataFrame({
"name": ["Swen", "Victor", "Alex"],
"length": [1.82, 1.85, 1.80],
"shoesize": [42, 44, 45]
})

#Excluding single column
PandasTypeSelector(exclude="int64").fit_transform(df)
# name length
#0 Swen 1.82
#1 Victor 1.85
#2 Alex 1.80

#Including multiple columns
PandasTypeSelector(include=["int64", "object"]).fit_transform(df)
# name shoesize
#0 Swen 42
#1 Victor 44
#2 Alex 45
```
"""

def __init__(self, include=None, exclude=None):
Expand Down
18 changes: 18 additions & 0 deletions sklego/preprocessing/projections.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,24 @@ class InformationFilter(BaseEstimator, TransformerMixin):
The projection matrix that can be used to filter information out of a dataset.
col_ids_ : List[int] of length `len(columns)`
The list of column ids of the sensitive columns.

Example
-------
```py
import pandas as pd
from sklego.preprocessing import InformationFilter

df = pd.DataFrame({
"user_id": [101, 102, 103],
"length": [1.82, 1.85, 1.80],
"age": [21, 37, 45]
})

InformationFilter(columns=["length", "age"], alpha=0.5).fit_transform(df)
# array([[50.10152483, 3.87905643],
# [50.26253897, 19.59684308],
# [52.66084873, 28.06719867]])
```
"""

def __init__(self, columns, alpha=1):
Expand Down
16 changes: 16 additions & 0 deletions sklego/preprocessing/repeatingbasis.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,22 @@ class RepeatingBasisFunction(TransformerMixin, BaseEstimator):
----------
pipeline_ : ColumnTransformer
Fitted `ColumnTransformer` object used to transform data with repeating basis functions.

Example
-------
```py
import pandas as pd
from sklego.preprocessing import RepeatingBasisFunction

df = pd.DataFrame({
"user_id": [101, 102, 103],
"created_day": [5, 1, 7]
})
RepeatingBasisFunction(column="created_day", input_range=(1,7)).fit_transform(df)
# array([[0.06217652, 0.00432024, 0.16901332, 0.89483932, 0.64118039],
# [1. , 0.36787944, 0.01831564, 0.01831564, 0.36787944],
# [1. , 0.36787944, 0.01831564, 0.01831564, 0.36787944]])
```
"""

def __init__(self, column=0, remainder="drop", n_periods=12, input_range=None, width=1.0):
Expand Down