Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Isolation Forest grid search test and API [nocheck] #5660

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

valenad1
Copy link
Collaborator

@valenad1 valenad1 commented Aug 31, 2021

I have noticed some inconsistency in IF grid search test for R.

The R API treat IF as supervised algorithm and flag is_supervised isn't working properly. This PR provide fix for is_supervised flag and change that IF will be unsupervised by default.

I have created second PR in case that we cannot suddenly change IF to unsupervised for grid search. This PR (#5661) only fix is_supervised flag and add test that IF can be used with is_supervised = FALSE flag.

@valenad1 valenad1 force-pushed the valenad-fix-isolation-forest-grid branch from 304f579 to 12dd641 Compare August 31, 2021 17:05
Copy link
Contributor

@michalkurka michalkurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not introduce a flag is_supervised. Note: API changes like this one should be discussed in #h2o-3-api-updates.

@valenad1
Copy link
Collaborator Author

Please do not introduce a flag is_supervised. Note: API changes like this one should be discussed in #h2o-3-api-updates.

I am not introducing new one, flag is_supervised is already there, I am just using it properly

@michalkurka
Copy link
Contributor

@valenad1 I am sorry - my bad - I misunderstood the PR.

I am surprised to see the option there - I don't think we should have it

Would there be a solution that doesn't utilize this flag and works out of the box?

@valenad1
Copy link
Collaborator Author

Would there be a solution that doesn't utilize this flag and works out of the box?

I'll figure something out :)

@@ -21,7 +21,9 @@ test.grid.resume <- function() {
x = 1:4,
y = 5,
training_frame = iris.hex,
hyper_params = hyper_parameters
is_supervised = TRUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set the flag here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made IF unsupervised by default. This test is using the y parameter and then it should be supervised.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very surprised this works - I thought IF can only have a labeled validation frame, not the training frame. Hmm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check the output of the grid search, since I have only used API I didn't think about that. Good point 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to say that I didn't change the test, I just focused that the same code will be used with this flag. I will check what is the actual output of the grid search :)

Copy link
Contributor

@michalkurka michalkurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think given that the option is_supervised already exists and enables the functionality to work - we should get this fix in.

@valenad1 this looks like this was a code change (in models.R) and not just a test change - in this case we should have a jira

@valenad1
Copy link
Collaborator Author

@valenad1 this looks like this was a code change (in models.R) and not just a test change - in this case we should have a jira

Agree, I will make a JIRA

So I merge this fix and after implement the change that will drop is_supervised parameter. Is that correct? @michalkurka

@michalkurka
Copy link
Contributor

So I merge this fix and after implement the change that will drop is_supervised parameter. Is that correct? @michalkurka

If I read this PR correctly, the option didn't work before, or at least not in recent years. If so - it was unused - and thus could be gracefully deprecated (= eg. warning in 3.34 and then removal in 3.36)

@valenad1
Copy link
Collaborator Author

If I read this PR correctly, the option didn't work before, or at least not in recent years.

I agree.

If so - it was unused - and thus could be gracefully deprecated (= eg. warning in 3.34 and then removal in 3.36)

Ok, I'll try to use a different way (somehow it works for python without this parameter) and make this parameter depracated.

In my opinion grid search for Isolation Forest should be unsupervised by default and supervised on option.
@valenad1 valenad1 force-pushed the valenad-fix-isolation-forest-grid branch from 12dd641 to f94b440 Compare September 1, 2021 13:39
Copy link
Contributor

@michalkurka michalkurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not expose is_supervised flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants