-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Enhancing polars support by introducing set_output
#399
Conversation
skpro/utils/set_output.py
Outdated
} | ||
|
||
|
||
def check_output_config(estimator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this function and the next one private, to avoid user calls and allowing us to make changes quickly
skpro/utils/set_output.py
Outdated
|
||
|
||
def transform_output( | ||
obj, valid, from_type, default_to_type, default_scitype, output_config, store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function has many parameters - this is a code smell.
I would suggest to remove the convert
-call, and only use this function to get the from/to type - that way, we also do not need to replace convert
by transform_output
in the regressor
skpro/regression/base/_base.py
Outdated
@@ -38,6 +39,10 @@ class BaseProbaRegressor(BaseEstimator): | |||
"C_inner_mtype": "pd_DataFrame_Table", | |||
} | |||
|
|||
_config = { | |||
"transform": "default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe transform_output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I would request two changes:
- make the new utilities private
- decouple
transform_output
from the output, and call it only if the setting is not the default
…e not table scitypes
let me know in case you need a review |
Another review would be quite helpful.. I think I'll try and re-write the description to be more clear what the goal of the pull request is and whats being added in the PR. I'll also note that the new code that supports mi-columns and si-columns/indxes for polars exists inside So if needed, I can separate the polars conversion methods inside
I'll also note that although |
I remember asking you before why in the base code the output conversion only existed inside the And given that reasoning, do you think it makes sense to have the other |
From our discussion, summarizing what I think the key points for this PR, to avoid it moving around with minor changes:
I am happy to write these, if you feel you are stuck - please let me know. Or, if they exist somewhere, links would be appreciated.
I see - I would though not suggest to split here too much for now, instead we could just add the
This seems a bit suspicious, why would these changes impact the plotting function at all? I was under the impression that all existing code should run unchanged. Do you have an explanation for the failure? |
That is because the different contracts of the underscore-methods, which are hopfully correctly stated in the extension template:
Due to this, in the case of |
Yes, but weren't you already adding such a conversion in this PR? Which I think makes sense. |
Ah I see, so do you think in your opinion for
instead of automatically trying to convert it back to the input type of
|
I would do the second, not the first set of scenarios, because it is consistent with input/output behaviour as current. The quoted line is just explaining the status quo. I think once we have the datatype extension though, it will not be difficult to switch between the two. |
I see, then it would be a good idea to write out the various possible data container types for the |
Part of these key points can be found in the design doc
This is covered in section 5.2. The main use cases for
Polars dataframe outputs are described in section 3 for proba methods
I may need some help for how the outputs get mapped onto mtypes and scitypes, i've described a solution inside section 5.3. Some help regarding how to design the mapping method, how we can integrate data container support for the |
Introduces files
set_output
insideskpro.utils
and new tests filetest_set_output
inside thetests
folder. As part of sktime/enhancement-proposals#34 and the notes written in my mentorship programme .In this pr:
skpro.datatypes._adapter.polars
. In the polars adapter file,convert_polars_to_pandas_with_index
now checks to see if there was melted multi-index columns (these columns will be denoated via "foo__bar" convention) andconvert_pandas_to_polars_with_index
now checks to see if there are multi-index columns inside the pandas DataFrame (like inpredict_interval
andpredict_quantile
. If so, then we will melt down these multi-index columns into single-level columns before converting into a polars dataframe.skpro.utils.set_output.check_output_config
to ensure that transformations set by the users are aligned with availableskpro
output data containers.skpro.utils.set_output.transform_output
in order to convert the resulting DataFrame into user specified or default data containers.transform_output
acts like a wrapper around the ordinaryconvert
function, but instead it also checks whether to convert based upon the user specifiedmtype
or to leverage the default originalmtype
seen infit
_config
insideBaseProbaRegressor
and a new functionset_output
which mirrorssklearn
'sset_output
for familiarity.Relates to #342 #449