-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Polars adapter enhancements #449
[ENH] Polars adapter enhancements #449
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some questions about how is index name being handled in the conversion utils. The code will get the job done currently, but suppose I convert a pandas DataFrame having a index with a name to polars, and then convert it back to pandas the index name would be lost.
skpro/datatypes/_adapter/polars.py
Outdated
from polars import from_pandas | ||
|
||
obj.reset_index() | ||
obj.rename(columns={"index": "__index__"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we retain the index name too. This code won't work for data frames that already have a name for their index. Does skpro assumes that data frame will always have a default RangeIndex
without any name @fkiraly @julian-fong ??, there may be scenarios where a index having name is passed and the conversion util will fail in this case.
How sktime handles this is by appending the name infront of the __index__
convention. So suppose if the DataFrame has a index named 1
the util will convert it to __index__1
in polars frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats a good point, i'll make that change then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I thought that's what is happening in the dask
converter as well, is it?
Tests are failing in jobs because the
is returned when it expects
@fkiraly any preferred method on how to solve this? |
@julian-fong I know why the tests are failing here, you need to change the polars fixture example to match this convention, in skpro the examples are located at |
My answer would be to not include an If you convert from
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - @pranavvp16, can you kindly review this, too?
Forgot to include async discussion but it is summarized as follows. In the code: we will only ignore the addition of Otherwise, we will build a new column called |
LGTM! |
Thank you for the reviews! |
adds index support as part of #440 and is used to sync up polars conversion utilities between skpro and sktime.
Correponding sktime pr for polars conversion utilities is sktime/sktime#6455.
In this pr:
If a pandas Dataframe is a
from_type
and polars frame is ato_type
then during the conversion, we will save the index (assumed never to be in multi-index format) and insert it as an individual column with column name__index__
. Then the resulting pandas dataframe will be converted to a polars dataframe.In the inverse function, if we are converting from polars dataframe to pandas dataframe, if the column
__index__
exists in the pandas dataframe post-conversion, then we will map that column to the index before returning the pandas DataframeAfter this is merged, #447 will be implemented as a
polars
only estimator. tests will also be written to check polars input end to end and pandas input and output through the polars estimator (i.e pandas input into polars estimator -> polars predictions -> pandas output)