Process results supposes the "Averages" row is always at the same index #937

manuelgitgomes · 2024-04-30T16:21:31Z

Process results now, as is, assumes the averages row is at the same index in every evaluation, which might not be that case.
This is done in this function:

atom/atom_batch_execution/scripts/process_results

Lines 25 to 43 in 9214691

    
           def averageCsvFiles(filenames): 
        
               # Get a list of the pandas representation of the csvs 
        
               pandas_text_df = pd.read_csv(filenames[0]) 
        
               pandas_dfs = [] 
        
               for filename_idx, filename in enumerate(filenames): 
        
                   df = pd.read_csv(filename) 
        
                   df = df.apply(pd.to_numeric, errors="coerce") 
        
                   pandas_dfs.append(df) 
        
               # Concatenate and average all numeric cells 
        
               g = pd.concat(pandas_dfs).groupby(level=0).mean() 
        
               # replace the text cells with 
        
               g.fillna(pandas_text_df, inplace=True) 
        
               return g

manuelgitgomes · 2024-04-30T16:25:42Z

Hello @miguelriemoliveira.
This function is now like this:

atom/atom_batch_execution/scripts/process_results

Lines 25 to 46 in f147c0e

    
           def averageCsvFiles(filenames): 
        
               # Get a list of the pandas representation of the csvs 
        
               pandas_text_df = pd.read_csv(filenames[0]) 
        
               pandas_dfs = [] 
        
               for filename_idx, filename in enumerate(filenames): 
        
                   df = pd.read_csv(filename) 
        
                   # Filter all cells for averages 
        
                   df = df[df['Collection #'].isin(['Averages'])] 
        
                   df = df.apply(pd.to_numeric, errors="coerce") 
        
                   pandas_dfs.append(df) 
        
               # Concatenate and average all numeric cells 
        
               g = pd.concat(pandas_dfs).mean() 
        
               # replace the text cells with 
        
               g.fillna('Averages', inplace=True) 
        
               return g

I have removed all non-Average rows from the df, so a simple average would suffice, no need to group.
From my testing, everything seems fine.
Did I break some assumption with this?

miguelriemoliveira · 2024-05-01T09:35:15Z

Well the assumption is that we will have one row where the first columns says "Averages", right?

I think we can go with that, mut it should perhaps be written in the code of the scripts producing these tables that this is the convention.

Kazadhum · 2024-05-02T21:56:45Z

I agree, I think this is okay like this.

Alternatively (though it sounds a bit "janky"), we could have an optional argument to change the "average row name"? That way it still works if the csv file has an average row with the string "avg" or even "average" without the A capitalized. We could retain some "general applicability" like that.

miguelriemoliveira · 2024-05-03T08:09:38Z

I like this idea from @Kazadhum .

manuelgitgomes · 2024-05-03T09:27:06Z

Hello @miguelriemoliveira and @Kazadhum.
This suggestion has been added.
Can this be merged to main?

miguelriemoliveira · 2024-05-03T09:37:19Z

Sure. Thanks.

Kazadhum · 2024-05-03T09:38:00Z

Yes, thank you!

* #937 Removing all non-Averages rows * #937 Added option to change the name of the average row

manuelgitgomes added the bug Something isn't working label Apr 30, 2024

manuelgitgomes self-assigned this Apr 30, 2024

manuelgitgomes added a commit that referenced this issue Apr 30, 2024

#937 Removing all non-Averages rows

f147c0e

manuelgitgomes added a commit that referenced this issue May 3, 2024

#937 Added option to change the name of the average row

b405c84

manuelgitgomes added a commit that referenced this issue May 6, 2024

#937 Average row can now be of varying indexes (#946)

5336e46

* #937 Removing all non-Averages rows * #937 Added option to change the name of the average row

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process results supposes the "Averages" row is always at the same index #937

Process results supposes the "Averages" row is always at the same index #937

manuelgitgomes commented Apr 30, 2024

manuelgitgomes commented Apr 30, 2024

miguelriemoliveira commented May 1, 2024

Kazadhum commented May 2, 2024

miguelriemoliveira commented May 3, 2024

manuelgitgomes commented May 3, 2024

miguelriemoliveira commented May 3, 2024

Kazadhum commented May 3, 2024

Process results supposes the "Averages" row is always at the same index #937

Process results supposes the "Averages" row is always at the same index #937

Comments

manuelgitgomes commented Apr 30, 2024

manuelgitgomes commented Apr 30, 2024

miguelriemoliveira commented May 1, 2024

Kazadhum commented May 2, 2024

miguelriemoliveira commented May 3, 2024

manuelgitgomes commented May 3, 2024

miguelriemoliveira commented May 3, 2024

Kazadhum commented May 3, 2024