Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas Python Code Generator #562

Open
BenGalewsky opened this issue Mar 27, 2023 · 0 comments · May be fixed by #574
Open

Pandas Python Code Generator #562

BenGalewsky opened this issue Mar 27, 2023 · 0 comments · May be fixed by #574
Labels
PONDD PONDD Grand Challenge

Comments

@BenGalewsky
Copy link
Contributor

Story

As a analyzer I want to be able to work with data in CSV files so I can standardize my analysis

Description

We want to be able to read data from weird CSV and TSV formats from an experiment. We will use the Pandas library as the basis for this transformer.

We will use the python code generator with the following assumptions:

  1. The function will be called with an open file handle
  2. The function will return a Pandas dataframe

There will be a new python code generator based on the uproot python code generator. Its transform_single_file.py script will write the data frame to parquet using the dataframe to_parquet method.

It looks like it is possible to stream directly from the Dataframe to a parquet object in minio https://stackoverflow.com/a/57838851

@BenGalewsky BenGalewsky added the PONDD PONDD Grand Challenge label Mar 27, 2023
@BenGalewsky BenGalewsky moved this to Ready in ServiceX Mar 27, 2023
@shriram192 shriram192 moved this from Ready to In Progress in ServiceX Apr 24, 2023
@BenGalewsky BenGalewsky moved this from In Progress to Ready for Review in ServiceX May 2, 2023
@BenGalewsky BenGalewsky removed this from ServiceX Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PONDD PONDD Grand Challenge
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant