Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process results supposes the "Averages" row is always at the same index #937

Open
manuelgitgomes opened this issue Apr 30, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@manuelgitgomes
Copy link
Collaborator

Process results now, as is, assumes the averages row is at the same index in every evaluation, which might not be that case.
This is done in this function:

def averageCsvFiles(filenames):
# Get a list of the pandas representation of the csvs
pandas_text_df = pd.read_csv(filenames[0])
pandas_dfs = []
for filename_idx, filename in enumerate(filenames):
df = pd.read_csv(filename)
df = df.apply(pd.to_numeric, errors="coerce")
pandas_dfs.append(df)
# Concatenate and average all numeric cells
g = pd.concat(pandas_dfs).groupby(level=0).mean()
# replace the text cells with
g.fillna(pandas_text_df, inplace=True)
return g

@manuelgitgomes manuelgitgomes added the bug Something isn't working label Apr 30, 2024
@manuelgitgomes manuelgitgomes self-assigned this Apr 30, 2024
@manuelgitgomes
Copy link
Collaborator Author

Hello @miguelriemoliveira.
This function is now like this:

def averageCsvFiles(filenames):
# Get a list of the pandas representation of the csvs
pandas_text_df = pd.read_csv(filenames[0])
pandas_dfs = []
for filename_idx, filename in enumerate(filenames):
df = pd.read_csv(filename)
# Filter all cells for averages
df = df[df['Collection #'].isin(['Averages'])]
df = df.apply(pd.to_numeric, errors="coerce")
pandas_dfs.append(df)
# Concatenate and average all numeric cells
g = pd.concat(pandas_dfs).mean()
# replace the text cells with
g.fillna('Averages', inplace=True)
return g

I have removed all non-Average rows from the df, so a simple average would suffice, no need to group.
From my testing, everything seems fine.
Did I break some assumption with this?

@miguelriemoliveira
Copy link
Member

Well the assumption is that we will have one row where the first columns says "Averages", right?

I think we can go with that, mut it should perhaps be written in the code of the scripts producing these tables that this is the convention.

@Kazadhum
Copy link
Collaborator

Kazadhum commented May 2, 2024

I agree, I think this is okay like this.

Alternatively (though it sounds a bit "janky"), we could have an optional argument to change the "average row name"? That way it still works if the csv file has an average row with the string "avg" or even "average" without the A capitalized. We could retain some "general applicability" like that.

@miguelriemoliveira
Copy link
Member

I like this idea from @Kazadhum .

@manuelgitgomes
Copy link
Collaborator Author

Hello @miguelriemoliveira and @Kazadhum.
This suggestion has been added.
Can this be merged to main?

@miguelriemoliveira
Copy link
Member

Sure. Thanks.

@Kazadhum
Copy link
Collaborator

Kazadhum commented May 3, 2024

Yes, thank you!

manuelgitgomes added a commit that referenced this issue May 6, 2024
* #937 Removing all non-Averages rows

* #937 Added option to change the name of the average row
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants