Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Separate data preprocessing from plotters #134

Open
DanielYang59 opened this issue Apr 13, 2024 · 1 comment
Open

[Enhancement] Separate data preprocessing from plotters #134

DanielYang59 opened this issue Apr 13, 2024 · 1 comment
Labels
api Application programming interface dx Developer experience enhancement Improvement to existing features/functionality

Comments

@DanielYang59
Copy link
Collaborator

DanielYang59 commented Apr 13, 2024

Separate data preprocessing from plotters

Previously proposed in #81 (comment), it might be good to separate data preprocess (could make them private so users could still input any format, make this invisible from user) from plotters, which could hopefully resolve #131 (comment) too.

Suggestions

Currently almost each plotter accept various types of data, but at the cost of plotter being very complex (and repeated code). I would suggest making plotter itself only handle single (or very few) data type and migrate the following data processing to some dedicated utilities:

  • Data type conversion to numpy.array or pandas.DataFrame (or some other preferred type)
  • Missing value imputation (could wrap scikit-learn)
  • Anomaly value handling (NaN or inifinity)

Potential Impact

I don't expect this to be breaking (or even visible to user), but certainly would be a lot of work as almost the entire code base need to be refactored.

@janosh
Copy link
Owner

janosh commented Apr 16, 2024

fully on board with this! as i wrote in #81 (comment):

i'd prefer dataframes over arrays as they have a more powerful API

they can also store more metadata (both in column/index names and in df.attrs) and do a lot of missing value handling automatically

@janosh janosh added enhancement Improvement to existing features/functionality api Application programming interface dx Developer experience labels Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Application programming interface dx Developer experience enhancement Improvement to existing features/functionality
Projects
None yet
Development

No branches or pull requests

2 participants