generated from NOAA-OWP/owp-open-source-project-template
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change fsds to fs across python & R packages #23
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…o modular functions inside package
describe the unique dependencies
…fix associated functions when behavior didn't follow expected behavior
…_train_eval module
…go_train_eval module; feat: move some hard-coded variables into the algorithm config file
…_nwissite_gageids function
Add nwissite gage_id format checker to standard processing
Minor fix and doc update
Update hyperlinks and descriptions with latest fsds to fs change, and OWP repo location.
doc: minor path fix
@bolotinl, note that this PR also includes a new feature: adding the pre-existing |
… documentation using fs instead of fsds
bolotinl
approved these changes
Oct 22, 2024
glitt13
added a commit
to glitt13/formulation-selector
that referenced
this pull request
Oct 22, 2024
* feat: developing algorithm training and evaluation module * fix: minor bug fixes with paths and np array values retrieval * feat: create initial fs_algo package * feat: contain all training/eval into a single class * feat: simplify evaluation file write and module import * feat: add basic unit testing for AlgoTrainEval class * feat: convert save dir structure creation and fsds dataset reader into modular functions inside package * Update README.md describe the unique dependencies * feat: simplify aspects of attribute organization and combining with metrics * feat: beginning to convert attribute wrangling into a class * feat: add algorithm configuration file * feat: established class for attribute configuration file, scripts functioning * feat: add verbose option * fix: update to warnings.warn() * feat: building out additional unit tests for AttrConfigAndVars class * chore: remove spaces * feat: add unit test for fs_read_attr_comid * feat: add UserWarnings and associated unit test * feat: add unit tests for _find_feat_srce_id, fs_retr_nhdp_comids and fix associated functions when behavior didn't follow expected behavior * feat: add unit test for fs_save_algo_dir_struct * feat: a basic unit test for _open_response_data_fsds * chore: simplify algo script based on functionality moved into fs_algo_train_eval module * doc: add sphinx documentation to _read_attr_config and fs_read_attr_comid * doc: add sphinx-formatted documentation to the functions in the fs_algo_train_eval module; feat: move some hard-coded variables into the algorithm config file * fix: changes vars to attrs in AlgoTrainEval arg * fix: added the new parameters that were hard-coded (test_size & seed) * fix: swapped the train/test fractions to appropriate printout order * feat: make sphinx documentation * fix: reinstall sphinx docs for fsds_proc * fix: remove unused path_camels * fix: remove unused references to path_camels * fix: update standard fsds_proc config files to create netcdf rather than csv; rename these files schema to config * doc: update config file documentation on preferred save_type * doc: update description of yaml file's dataset * fix: update config files with featureID and featureSource entries * fix: change vars to attrs based on package's object name change * fix: change logic to ensure config file read if dataset attribute read failed * feat: add a raw data input checker/corrector for cases when nwissite gage ids are missing the leading 0 * fix: changed path_data to represent the raw input files containing corrected nwissite USGS gage ids (leading zeros) * fix: added appropriate fillna for nwissite gage ids not needed to be corrected * fix: adjust path check for attributes instead of algo * doc: add descriptive notes on algo pre-processing and suggest future improvements for datasets not processed with fsds_proc with TODO * doc: simplify attr_config, change dir_attrs to dir_db_attrs * chore: add some additional hydroatlas and USGS NHD variables for consideration * chore: add updated attribute variables to config files, based on top 5 variables considered by Bolotin et al 2022 SI work * fix: add error handling when hydrofabric could not be downloaded for a given comid * fix: avoid index error generated from attr_ddf_sub.shape[0].compute() by simply performing attr_ddf_sub.compute() first, which is needed anyway * fix: change fs_read_attr_comid to return pd.DataFrame instead of dask df, and add checks ensuring 'value' data column being float type, check for no NA values present * feat: add NA drop prior to train/test split * feat: create a separate function that standarizes the algorithm file save path * doc: add documentation to the std_algo_path func * feat: create script to generate algo prediction data for testing * feat: generating predictions from trained algos under dev * feat: add processing of xssa locations, randomly selecting a subset to use for algo prediction * feat: develop algo prediction's config ingest, and determine paths to prediction locations and trained algos * feat: add config file path builder * feat: create metric prediction and write results to file * feat: build unit test for build_cfig_path() * feat: build unit test for build_cfig_path() * feat: add unit testsfor std_pred_path and _read_pred_comid; test coverage now at 92% * feat: add oob = True as default for RandomForestRegressor * feat: add hyperparameterization capability using grid search and associated unit tests * feat: add unit testing for train_eval() * chore: change algo config for testing out hyperparameterization * chore: add UserWarning category specification to warnings.warn * fix: algo config assignment accidentally only looked at first line of params * fix: make sure that hyperparameter key:value pairings contained inside dict, not list * fix: adjust unit test's algo_config formats to represent the issue of a dict of a list, which the list_to_dict() function then converts * fix: _check_attributes_exist now appropriately reports missing attributes and comids * fix: ensure algo and pipeline keys contain algo and pipeline object types in the grid search case * Update pkg/fs_algo/fs_algo/fs_algo_train_eval.py Co-authored-by: LaurenBolotin-NOAA <[email protected]> * Update pkg/fs_algo/fs_algo/fs_algo_train_eval.py Co-authored-by: LaurenBolotin-NOAA <[email protected]> * chore: Update README.md Rename proc_fsds to fsds_proc * chore: rename fsds to fs in all python-related files and config files * chore: rename fsds_proc directory to fs_proc * chore: rename additional fsds to fs * chore: rename remaining fsds to fs * doc: minor change to install instructions of fs_proc * feat: add requirements for fs_algo package * feat: add requirements.yml for conda environment of fs_algo/fs_proc python packages * doc: add details on func for creating col_schema_df * feat: add nwissite gage id leading zero checker as automated step * fix: new line continuation in f-string messages related to nwis checker * fix: update local config path and example in script * doc: change install description for this package * fix: modify logical test on elif featureSource == nwissite * feat: update and add new unit testing that accommodates the check_fix_nwissite_gageids function * fix: update temp directory assignment to work with non-Unix systems * doc: minor adjustment for instructional example on running unit tests * Make the change match the exact repo name * Make changes match exact repo name * fix: rename fsds to fs in files corresponding to proc.attr.hydfab R package * feat: update R package with name change of fsds to fs * chore: update fsds to fs in config files and R unit tests * doc: update README from fsds to fs in non-url instances * doc: Update README.md Update hyperlinks and descriptions with latest fsds to fs change, and OWP repo location. * Update README.md doc: minor path fix * chore: rename fsds_attrs_grab.R to fs_attrs_grab.R and add updated Rd documentation using fs instead of fsds * Fix typo in README.md * Fix pkg name typo in xssa_attr_config_all_vars_avail.yaml * Fix pkg name typo xssa_attr_config_missing_vars.yaml * Change default format to netcdf -- zarr doesn't work --------- Co-authored-by: LaurenBolotin-NOAA <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Renamed all things fsds to fs (formulation-selector) in the python & R code base.
Also integrated the NWIS gage ID checker/corrector into the standard fs_proc processing
Additions
Integrated the
check_nwis
logic intoproc_col_schema()
Removals
Changes
Testing
check_fix_nwissite_gageids
Screenshots
Notes
Todos
Checklist
Testing checklist
Target Environment support
Accessibility
Other