Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permutation_test (1/4): add comments and docstring of the functions #111

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

lionelkusch
Copy link
Collaborator

  • I removed the function permutation_test_cv. This function was a permutation_test with cross-validation for finding the best parameter C for the linearSVR. In this context, it can be used as a global surrogate based on Linear SVR but I estimated it was too specific and that the user can do it by hand for the moment.

  • I move the function step_down_max_t to stat_tool because it's a function used for computing the pvalue.
    I modified the function permutation_test in consequence for returning the weight and the distribution of the weights from the permutation of y. I think that this intermediate step is more adapted to a general API.

  • I modified the test in consequence with this new signature of functions.

@lionelkusch
Copy link
Collaborator Author

My comments for step_down_max_t need to be verified because there are based on AI.
I don't perfectly the algorithm but the actual code seems very different to the original code.
Can someone check it?

Copy link

codecov bot commented Jan 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (19e6cf7) to head (47b7d84).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##             main      #111       +/-   ##
============================================
+ Coverage   81.70%   100.00%   +18.29%     
============================================
  Files          43        21       -22     
  Lines        2312       765     -1547     
============================================
- Hits         1889       765     -1124     
+ Misses        423         0      -423     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks mostly good. I have 2 comments:

  • I don't see why we no longer allow to run CV
  • the Westfall-Young "maxT" procedure seems overly complicated

src/hidimstat/stat_tools.py Show resolved Hide resolved
src/hidimstat/permutation_test.py Show resolved Hide resolved
Copy link
Collaborator

@jpaillard jpaillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To re-open the discussion on the API: when I re-implemented the PermutationImportance class (which differs since X columns are permuted instead of y here), I found it convenient to have a class with .fit and .score, rather than a function like here.

Would having consistency in having both as functions or classes be clearer? If so, I am open to returning to functions if you find it better.
Here again, a difference is that this function does not require train / test split which might be the criterion for using classes.

src/hidimstat/permutation_test.py Show resolved Hide resolved
@@ -152,18 +153,23 @@ def preprocess_haxby(subject=2, memory=None):
SVR_permutation_test_inference = False
if SVR_permutation_test_inference:
# We computed the regularization parameter by CV (C = 0.1)
pval_corr_svr_perm_test, one_minus_pval_corr_svr_perm_test = permutation_test_cv(
X, y, n_permutations=50, C=0.1
estimator = LinearSVR(C=0.1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C is the regularization parameter, it is not optimized via CV here. Or am I missing something?
I suggest using something like randomized search.

Suggested change
estimator = LinearSVR(C=0.1)
estimator = RandomizedSearchCV(
LinearSVR(random_state=42),
param_distributions={ "C": np.logspace(-3, 3, 10), },
n_iter=10,
n_jobs=5,
random_state=42,
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't provide an optimisation by CV because, in the original example, it wasn't the case.
Nevertheless, we need to be careful with optimisation because it will increase the time for running examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an estimate of the compute time ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my computer, the example without the CV is: 5m39s and with CV is: 7m16s.
In estimation, it increases 2 minutes the time of calculation.

One solution can be to store the best value of the parameter and avoid doing the refitting each time. .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there's L153 SVR_permutation_test_inference = False so is this even running in the CI?
If not, I suggest leaving the CV to show this possibility to the user without increasing CI time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put SVR_permutation_test_inference = True for my test.

Including, SVR_permutation_test_inference options in my computation time, I get:
SVR_permutation_test_inference = True and CV: 7m16s.
SVR_permutation_test_inference = True and No CV: 5m39s
SVR_permutation_test_inference = False and CV: 1m57s.
SVR_permutation_test_inference = False and No CV: 1m48s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that this is toio much time for a method that does not enjoy any theoretical guarantee.
Out of curiosity, what do you get if you replace the SVR with a RidgeRegression ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being, we should not change this if there is no explicit reason for that (e.g. significant reduction of documentation generation time).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that this is toio much time for a method that does not enjoy any theoretical guarantee. Out of curiosity, what do you get if you replace the SVR with a RidgeRegression ?

This is already done in the example.

examples/plot_fmri_data_example.py Show resolved Hide resolved
@lionelkusch
Copy link
Collaborator Author

@jpaillard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants