Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change fsds to fs across python & R packages #23

Merged
merged 112 commits into from
Oct 22, 2024
Merged

Conversation

glitt13
Copy link
Collaborator

@glitt13 glitt13 commented Oct 13, 2024

Renamed all things fsds to fs (formulation-selector) in the python & R code base.
Also integrated the NWIS gage ID checker/corrector into the standard fs_proc processing

Additions

Integrated the check_nwis logic into proc_col_schema()

Removals

Changes

Testing

  1. All unit tests passed corresponding to the fs_proc and fs_algo packages modified fsds to fs
  2. Unit test created for the NWIS checker, check_fix_nwissite_gageids

Screenshots

Notes

Todos

  • Still need to modify fsds naming in the R code, but will do so once the existing PR review is accepted to reduce merge conflicts.

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

glitt13 and others added 30 commits August 22, 2024 09:29
describe the unique dependencies
…fix associated functions when behavior didn't follow expected behavior
…go_train_eval module; feat: move some hard-coded variables into the algorithm config file
@glitt13
Copy link
Collaborator Author

glitt13 commented Oct 16, 2024

@bolotinl, note that this PR also includes a new feature: adding the pre-existing check_nwis_gage_id_fix() function to become a standard check inside proc_col_schema() when NWIS gage data are used as the gage_id. This PR also added the new unit test corresponding to check_nwis_gage_id_fix()

@glitt13 glitt13 changed the title Change fsds to fs in python code Change fsds to fs across python & R packages Oct 17, 2024
@bolotinl bolotinl merged commit 6e54890 into NOAA-OWP:main Oct 22, 2024
glitt13 added a commit to glitt13/formulation-selector that referenced this pull request Oct 22, 2024
* feat: developing algorithm training and evaluation module

* fix: minor bug fixes with paths and np array values retrieval

* feat: create initial fs_algo package

* feat: contain all training/eval into a single class

* feat: simplify evaluation file write and module import

* feat: add basic unit testing for AlgoTrainEval class

* feat: convert save dir structure creation and fsds dataset reader into modular functions inside package

* Update README.md

describe the unique dependencies

* feat: simplify aspects of attribute organization and combining with metrics

* feat: beginning to convert attribute wrangling into a class

* feat: add algorithm configuration file

* feat: established class for attribute configuration file, scripts functioning

* feat: add verbose option

* fix: update to warnings.warn()

* feat: building out additional unit tests for AttrConfigAndVars class

* chore: remove spaces

* feat: add unit test for fs_read_attr_comid

* feat: add UserWarnings and associated unit test

* feat: add unit tests for _find_feat_srce_id, fs_retr_nhdp_comids and fix associated functions when behavior didn't follow expected behavior

* feat: add unit test for fs_save_algo_dir_struct

* feat: a basic unit test for _open_response_data_fsds

* chore: simplify algo script based on functionality moved into fs_algo_train_eval module

* doc: add sphinx documentation to _read_attr_config and fs_read_attr_comid

* doc: add sphinx-formatted  documentation to the functions in the fs_algo_train_eval module; feat: move some hard-coded variables into the algorithm config file

* fix: changes vars to attrs in AlgoTrainEval arg

* fix: added the new parameters that were hard-coded (test_size & seed)

* fix: swapped the train/test fractions to appropriate printout order

* feat: make sphinx documentation

* fix: reinstall sphinx docs for fsds_proc

* fix: remove unused path_camels

* fix: remove unused references to path_camels

* fix: update standard fsds_proc config files to create netcdf rather than csv; rename these files  schema to config

* doc: update config file documentation on preferred save_type

* doc: update description of yaml file's dataset

* fix: update config files with featureID and featureSource entries

* fix: change vars to attrs based on package's object name change

* fix: change logic to ensure config file read if dataset attribute read failed

* feat: add a raw data input checker/corrector for cases when nwissite gage ids are missing the leading 0

* fix: changed path_data to represent the raw input files containing corrected nwissite USGS gage ids (leading zeros)

* fix: added appropriate fillna for nwissite gage ids not needed to be corrected

* fix: adjust path check for attributes instead of algo

* doc: add descriptive notes on algo pre-processing and suggest future improvements for datasets not processed with fsds_proc with TODO

* doc: simplify attr_config, change dir_attrs to dir_db_attrs

* chore: add some additional hydroatlas and USGS NHD variables for consideration

* chore: add updated attribute variables to config files, based on top 5 variables considered by Bolotin et al 2022 SI work

* fix: add error handling when hydrofabric could not be downloaded for a given comid

* fix: avoid index error generated from attr_ddf_sub.shape[0].compute() by simply performing attr_ddf_sub.compute() first, which is needed anyway

* fix: change fs_read_attr_comid to return pd.DataFrame instead of dask df, and add checks ensuring 'value' data column being float type, check for no NA values present

* feat: add NA drop prior to train/test split

* feat: create a separate function that standarizes the algorithm file save path

* doc: add documentation to the std_algo_path func

* feat: create script to generate algo prediction data for testing

* feat: generating predictions from trained algos under dev

* feat: add processing of xssa locations, randomly selecting a subset to use for algo prediction

* feat: develop algo prediction's config ingest, and determine paths to prediction locations and trained algos

* feat: add config file path builder

* feat: create metric prediction and write results to file

* feat: build unit test for build_cfig_path()

* feat: build unit test for build_cfig_path()

* feat: add unit testsfor std_pred_path and _read_pred_comid; test coverage now at 92%

* feat: add oob = True as default for RandomForestRegressor

* feat: add hyperparameterization capability using grid search and associated unit tests

* feat: add unit testing for train_eval()

* chore: change algo config for testing out hyperparameterization

* chore: add UserWarning category specification to warnings.warn

* fix: algo config assignment accidentally only looked at first line of params

* fix: make sure that hyperparameter key:value pairings contained inside dict, not list

* fix: adjust unit test's algo_config formats to represent the issue of a dict of a list, which the list_to_dict() function then converts

* fix: _check_attributes_exist now appropriately reports missing attributes and comids

* fix: ensure algo and pipeline keys contain algo and pipeline object types in the grid search case

* Update pkg/fs_algo/fs_algo/fs_algo_train_eval.py

Co-authored-by: LaurenBolotin-NOAA <[email protected]>

* Update pkg/fs_algo/fs_algo/fs_algo_train_eval.py

Co-authored-by: LaurenBolotin-NOAA <[email protected]>

* chore: Update README.md

Rename proc_fsds to fsds_proc

* chore: rename fsds to fs in all python-related files and config files

* chore: rename fsds_proc directory to fs_proc

* chore: rename additional fsds to fs

* chore: rename remaining fsds to fs

* doc: minor change to install instructions of fs_proc

* feat: add requirements for fs_algo package

* feat: add requirements.yml for conda environment of fs_algo/fs_proc python packages

* doc: add details on func for creating col_schema_df

* feat: add nwissite gage id leading zero checker as automated step

* fix: new line continuation in f-string messages related to nwis checker

* fix: update local config path and example in script

* doc: change install description for this package

* fix: modify logical test on elif featureSource == nwissite

* feat: update and add new unit testing that accommodates the check_fix_nwissite_gageids function

* fix: update temp directory assignment to work with non-Unix systems

* doc: minor adjustment for instructional example on running unit tests

* Make the change match the exact repo name

* Make changes match exact repo name

* fix: rename fsds to fs in files corresponding to proc.attr.hydfab R package

* feat: update R package with name change of fsds to fs

* chore: update fsds to fs in config files and R unit tests

* doc: update README from fsds to fs in non-url instances

* doc: Update README.md

Update hyperlinks and descriptions with latest fsds to fs change, and OWP repo location.

* Update README.md

doc: minor path fix

* chore: rename fsds_attrs_grab.R to fs_attrs_grab.R and add updated Rd documentation using fs instead of fsds

* Fix typo in README.md

* Fix pkg name typo in xssa_attr_config_all_vars_avail.yaml

* Fix pkg name typo xssa_attr_config_missing_vars.yaml

* Change default format to netcdf -- zarr doesn't work

---------

Co-authored-by: LaurenBolotin-NOAA <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants