Ss test mlpci #39

ssorou1 · 2025-01-29T22:06:28Z

The Bagging method to calculate ci is implemented into fs_algo_train_eval for rf and mlp models.

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

Testing checklist

Target Environment support

Windows
Linux
Browser

Accessibility

Keyboard friendly
Screen reader friendly

Other

…dev for parquet column features" This reverts commit 7faad8a, reversing changes made to bcd50c8.

fix: add back in accidental removal of read_type argument.

…tion-selector into ss_test_fci_dev3

glitt13

Nice work Soroush! I liked how you were very thorough with changing fs_pred_algo.py after making changes to fs_algo_train_eval.py that would impact the downstream processing. Please review the suggestions, and once you've addressed those I'll then give it another review. If I don't have immediate feedback from reading the code, I'll then try to run the code.

glitt13 · 2025-02-03T23:43:48Z

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

+            ci = self.calculate_rf_uncertainty(rf, self.X_train, self.X_test)
+
+            # Calculating mlp uncertainty using Bootstrap Aggregating (Bagging)
+            n_models_rf = 10  # Number of bootstrap models


It's best to make this a parameter that a user can assign rather than hard-code it into a function. We should add a parameter in the algo config yaml, with a default value set as 10 when reading in that yaml file using the AttrConfigAndVars() class. For example,
self.algo_config['rf'].get('n_models_rf_bootstrap',10)

Thank you very much Guy for the comments and advice! All the updates are applied to the ssorou1: ss_test_mapie branch:
xssa_algo_config.yaml is updated to include n_models_rf_bootstrap and n_models_mlp_bootstrap for rf and mlp models, respectively.
fs_algo_train_eval.py is updated to read the aforementioned variables from the algo config yaml. Please refer to rf_Bagging_ci and mlp_Bagging_ci functions.

glitt13 · 2025-02-03T23:49:12Z

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

+                rf.fit(X_train_resampled, y_train_resampled)
+
+                # Store predictions for the test set
+                rf_predictions.append(rf.predict(self.X_test))


If you were to run train_algos() in a loop, does the rf_predictions object continue to grow, or does it reset within each loop? We'd want the latter behavior.

The rf_predictions object in the newly added rf_Bagging_ci() is a local variable within the function, meaning it gets reinitialized as an empty list (rf_predictions = []) each time the function is called. Since train_algos() calls rf_Bagging_ci(), rf_predictions is reset in each loop iteration if train_algos() is run in a loop.

glitt13 · 2025-02-03T23:50:33Z

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

+            rf_predictions = np.array(rf_predictions)
+            mean_pred = rf_predictions.mean(axis=0)
+            std_pred = rf_predictions.std(axis=0)
+            lower_bound = mean_pred - 1.96 * std_pred


We should clarify that this is a 95% confidence interval (a good default!), and perhaps add in options for different confidence intervals (e.g. 90% and 99%).

A great point Guy! The 90%, 95% and 99% ci are applied in both rf_Bagging_ci() and mlp_Bagging_ci() functions:

Thanks Soroush! The specific intervals of interest could vary by user needs. Let's set a default of 95%, but can you also modify the config file so that a user may specify their desired confidence interval or intervals? If not already a feature, we'll have to be careful about how we save file output so that one can know which interval they're looking at.

https://github.com/ssorou1/formulation-selector/blob/275564fd21a7c3cce804193eb504c70c4a822dcf/pkg/fs_algo/fs_algo/fs_algo_train_eval.py#L1004-L1010

Thanks Guy! The yaml file is updated to include the confidence_level for each model. rf_Bagging_ci() and mlp_Bagging_ci() are updated to read the respective confidence levels from the yaml file and calculate the confidence interval based on the normal distribution assumption. The default is 95% as per your advice.

https://github.com/ssorou1/formulation-selector/blob/3bc23eacc6d54b692147f8cfded578fd8657c17e/pkg/fs_algo/fs_algo/fs_algo_train_eval.py#L1047-L1052
https://github.com/ssorou1/formulation-selector/blob/3bc23eacc6d54b692147f8cfded578fd8657c17e/scripts/eval_ingest/xssa/xssa_algo_config.yaml#L3-L6

glitt13 · 2025-02-03T23:55:40Z

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

+            std_pred = predictions.std(axis=0)
+            lower_bound = mean_pred - 1.96 * std_pred
+            upper_bound = mean_pred + 1.96 * std_pred
+


The fact that these steps are exactly the same as the random forest makes me think we should instead put this all into a new function, and you pass in the model object as a generic Regressor object (e.g. MLPRegressor, RandomForestRegressor) as an argument in the new function. That means we could re-use all the same processing steps.

rf_Bagging_ci() and mlp_Bagging_ci() are developed in "fs_algo_train_eval.py".

glitt13 · 2025-02-03T23:58:07Z

pkg/proc.attr.hydfab/.RData

Add pkg/proc.attr.hydfab/.RData to your own .gitignore file and this won't be a recurring problem.

Applied. Thanks!

glitt13 · 2025-02-04T00:08:39Z

pkg/fs_algo/fs_algo/fs_pred_algo.py

+                # pipe = joblib.load(path_algo)
+                pipeline_with_ci = joblib.load(path_algo)
+                pipe = pipeline_with_ci['pipe']  # Assign the actual pipeline (pipe) to 'pipe'
+                rf_model = pipe.named_steps['randomforestregressor']  # Use the correct step name


What happens when we haven't trained a randomforestregressor? This seems like it's too specific to a particular type of model. Ideally this should work with any model. It looks like since this relates to the forestci package, we should add in an 'if' statement specific to random forest while looping over all possible algorithm possibilities.

That is a valid point. Thanks! Both fs_pred_algo.py and fs_algo_train_eval.py were updated to address this issue. Please refer to commit 5c9e14b.

glitt13 · 2025-02-07T20:49:14Z

Shifting over to PR #41

Soroush Sorourian and others added 15 commits January 22, 2025 15:49

bring in the ci for rf in fs_algo_train_eval.py

31791cb

bring in ci to fs_proc_algo.py

dcc6c5a

bring in ci to fs_pred_algo.py

1805ec0

apply only one n_estimators (grid selection bug)

93ede73

fix a syntax error in fs_algo_train_eval

2d3dbfa

clean fs_algo_train_eval.py

1154eef

added unit test for std_Xtrain_path function

50e790c

Revert "Merge remote-tracking branch 'upstream/dev' into ss_test_fci_…

ffbbc62

…dev for parquet column features" This reverts commit 7faad8a, reversing changes made to bcd50c8.

added unit test for fci function

bdb7d32

Update fs_pred_algo.py

cb0efe1

fix: add back in accidental removal of read_type argument.

brought back list of values in n_estimators in xssa_algo_config.yaml

5eb00e4

Merge branch 'ss_test_fci_dev3' of https://github.com/ssorou1/formula…

c310236

…tion-selector into ss_test_fci_dev3

Incorporate Bagging into mlp in fs_algo_train_eval

737c36a

rf n_estimators=400

88661ad

Incorporate Bagging into rf in fs_algo_train_eval

ea914f4

ssorou1 requested a review from glitt13 January 29, 2025 22:06

glitt13 changed the base branch from main to dev February 3, 2025 23:32

glitt13 requested changes Feb 4, 2025

View reviewed changes

ssorou1 requested a review from glitt13 February 7, 2025 19:45

glitt13 closed this Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ss test mlpci #39

Ss test mlpci #39

ssorou1 commented Jan 29, 2025

glitt13 left a comment

glitt13 Feb 3, 2025

ssorou1 Feb 7, 2025 •

edited

Loading

glitt13 Feb 3, 2025

ssorou1 Feb 7, 2025 •

edited

Loading

glitt13 Feb 3, 2025

ssorou1 Feb 7, 2025

glitt13 Feb 7, 2025

ssorou1 Feb 7, 2025

ssorou1 Feb 7, 2025 •

edited

Loading

glitt13 Feb 3, 2025

ssorou1 Feb 7, 2025 •

edited

Loading

glitt13 Feb 3, 2025

ssorou1 Feb 7, 2025

glitt13 Feb 4, 2025

ssorou1 Feb 7, 2025

glitt13 commented Feb 7, 2025

Ss test mlpci #39

Ss test mlpci #39

Conversation

ssorou1 commented Jan 29, 2025

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

Testing checklist

Target Environment support

Accessibility

Other

glitt13 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssorou1 Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssorou1 Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssorou1 Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssorou1 Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glitt13 commented Feb 7, 2025

ssorou1 Feb 7, 2025 •

edited

Loading

ssorou1 Feb 7, 2025 •

edited

Loading

ssorou1 Feb 7, 2025 •

edited

Loading

ssorou1 Feb 7, 2025 •

edited

Loading