Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute aggregation/transformation + plotting & evaluation analyses #34

Merged
merged 110 commits into from
Jan 9, 2025

Conversation

glitt13
Copy link
Collaborator

@glitt13 glitt13 commented Dec 19, 2024

An approach to aggregate and transform existing attribute data to create new attribute data
Additionally, create & save plots that visualize results and aid in algorithm performance

Additions

  • xssa_attrs_tform.yaml: The example configuration file describing how variables are aggregated and transformed.
  • fs_tfrm_attrs.py : this is the main processing script that calls functions inside tfrm_attr.py
  • tfrm_attr.py : contains new functions for the fs_proc package
  • test_tfrm_attr.py: unit tests
  • fs_attrs_miss.R: Reads in missing comid-attributes file sometimes generated during fs_tfrm_attrs.py and attempts to find missing attribute data. In case missing attribute data exist, this Rscript is called inside the fs_tfrm_attrs.py to attempt to acquire data and use it for attribute transformation.
  • principal component analysis based on predictor dataset and response variable, including a generation of PCA plot: pca_stdscaled_tfrm, plot_pca_stdscaled_tfrm, plot_pca_stdscaled_cumulative_var, std_pca_plot_path, functions comprehensively summarized in the plot_pca_save_wrap wrapper
  • random forest feature importance, all relevant functions comprehensively summarized in the save_feat_imp_fig_wrap wrapper
  • algorithm evaluation using the learning_curve analysis, including the AlgoEvalPlotLC class, with functions comprehensively summarized with the plot_learning_curve_save_wrap wrapper
  • Create & save predicted vs observed regression plot, comprehensively summarized in the plot_pred_vs_obs_wrap wrapper
  • Generate map of predicted response variables, with all relevant functions comprehensively summarized in the plot_map_pred_wrap wrapper
  • Generate map of best prediction across multiple datasets, with all relevant functions comprehensively summarized by the plot_best_algo_wrap wrapper 
  • AGU 2024 analysis scripts: the ealstm analyses, such as /scripts/analysis/fs_proc_viz_best_ealstm.py and the more-formal /scripts/eval_ingest/ealstm/proc_ealstm_agu24.py plus associated config files in the ealstm/ directory
  • Created an updated algo training/testing evaluation script, fs_proc_algo_viz.py as an updated version of fs_proc_algo.py with new evaluation and plotting features

Removals

Changes

  • Converted scripts/config/attr_gen_camels.R from hard-coding into a generalizable form that uses the config file scripts/config/attr_gen_camels_config.yaml
  • proc.attr.hydfab: refactor attribute grabbing to pull multiple comids & attributes at once. Attribute grabbing now takes tens of minutes rather than days.

Testing

  1. Unit testing with test_tfrm_attr.py has been challenging to implement under a normal unittest package approach owing to a mysterious dask.dataframe as dd error. Implemented a work-around that partially tests this package by nixing most instances of using classes.
  2. proc.attr.hydfab: revise & improve unit testing with the aforementioned refactoring

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

…at strings just-in-case user doesn't use f'{dir_base}'
…ssing comids or variables have been identified, else write message that there could be an issue in the logic
bolotinl and others added 3 commits December 19, 2024 09:08
…yle theme (#33)

* Create custom matplotlib stylesheet for RaFTS plots

* Flip axes on scatter; change perf to pred for clarity

* Change perf to pred for clarity

* Read in mplstyle file directly from fs_algo

* incorporate plotting functions into fs_perf_viz.py

* Use functions for creating file output paths

* Change perf_map to pred_map

---------

Co-authored-by: glitt13 <[email protected]>
…fficient s3 retrievals of basin attribute data with proc_attr_mlti_wrap. Still needs integration into full processing.
…ibutes all at once; doc: update documentation pertaining to refactoring
… change in script to a different config file path
…he attribute grabbing needed when creating new transformation attributes
…nd response dataset; refactor: train/test split logic now considers common indices for simplicity
…e same comid; ensure unique comids b/w train/test split, ensure NA and duplicates consistently handled across multiple steps with the creation of combine_resp_gdf_comid_wrap()
ssorou1
ssorou1 previously approved these changes Jan 8, 2025
Copy link
Collaborator

@ssorou1 ssorou1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. fs_proc
    python proc_eval_metrics.py ~\git\formulation-selector\scripts\eval_ingest\xssa\xssa_config.yaml
    It was successfully done.
  2. proc.attr.hydfab
    Rscript fs_attrs_grab.R ~/git/formulation-selector/scripts/eval_ingest/xssa/xssa_attr_config.yaml
    The attribute folder could not be populated in Windows environment. Please refer to issue R package not being able to read .parquet files in Windows #35.
    Upon manually copying the parquet files, this step was done successfully.
    The unit tests were implemented in R and Python. In R, 46 unit tests were passed and 4 failed. In Python, 33 passed and one failed.
  3. fs_algo training:
    python fs_proc_algo.py ~\git\formulation-selector\scripts\eval_ingest\xssa\xssa_algo_config.yaml
    This step is successfully implemented.

@glitt13 glitt13 merged commit 694f275 into main Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants