Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong metadata sharing during fit with OptunaSearchCV #194

Open
Kreol64 opened this issue Jan 21, 2025 · 0 comments
Open

Wrong metadata sharing during fit with OptunaSearchCV #194

Kreol64 opened this issue Jan 21, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@Kreol64
Copy link

Kreol64 commented Jan 21, 2025

Expected behavior

  1. While running OptunaSearchCV and enable_metadata_routing=True, set_score_request() is expected to share metadata with scoring function only and set_fit_request() is expected to share metadata with fit function only
  2. If error_score='raise', any failed Trial is expected to raise error

Environment

  • Optuna version: 4.1.0
  • Optuna Integration version: 4.2.0
  • Python version: 3.10.8
  • OS: Windows-10-10.0.19045-SP0
  • (Optional) Other libraries and their versions:
    sklearn 1.4.1.post1

Error messages, stack traces, or logs

1)  UserWarning: Failed to report cross validation scores for TerminatorCallback, with error: The length of `scores` is expected to be greater than one.
  warnings.warn(warn_msg)

2) TypeError: Ridge.fit() got an unexpected keyword argument 'score_param'

Steps to reproduce

  1. If refit=False and n_jobs=1 then just observe warning that (I believe) should not be there
  2. If refit=True or n_jobs>1 observe incorrect passing of score-only parameters into fit function
  3. If metric returns NaN, error should be raised if error_score='raise". Currently a trial is failed and error message is empty - very hard to debug.
import numpy as np
import optuna
import sklearn
from sklearn.linear_model import Ridge
from sklearn.metrics import make_scorer
from optuna.integration import OptunaSearchCV
from sklearn.model_selection import GridSearchCV

sklearn.set_config(enable_metadata_routing=True)


def custom_metric(y: np.ndarray, y_pred: np.ndarray, sample_weight: np.ndarray, score_param: str, *args, **kwargs) -> float:
    assert score_param == 'a'
    assert np.all(sample_weight == np.ones(4))
    return 0.01


x = np.random.randn(10, 2)
y = np.random.randn(10)
y[-1] = np.nan
w = np.ones(len(y))
score_param = 'a'

sklearn_grid_params = dict(
    param_grid={'alpha': [0.0, 0.1]},
    scoring=make_scorer(
        custom_metric,
        greater_is_better=True,
    ).set_score_request(
        sample_weight=True,
        score_param=True
    ),
    cv=[(np.arange(10)[:6], np.arange(10)[6:])],
    return_train_score=False,
    refit=False
)

optuna_grid_params = dict(
    scoring=make_scorer(
        custom_metric,
        greater_is_better=True,
    ).set_score_request(
        sample_weight=True,
        score_param=True
    ),
    n_jobs=1,
    error_score='raise',
    n_trials=10,
    cv=[(np.arange(10)[:6], np.arange(10)[6:])],
    param_distributions={'alpha': optuna.distributions.CategoricalDistribution([0.0, 0.1])},
    random_state=77,
    return_train_score=False,
    refit=False
)

sklearn_grid = GridSearchCV(
    estimator=Ridge().set_fit_request(sample_weight=True),
    **sklearn_grid_params
)

optuna_grid = OptunaSearchCV(
    estimator=Ridge().set_fit_request(sample_weight=True),
    **optuna_grid_params
)

# runs without issues as expected. score_param is only fed into scoring function
sklearn_grid.fit(X=x, y=y, sample_weight=w, score_param=score_param)

# Observe warning "UserWarning: Failed to report cross validation scores for TerminatorCallback, with error: The length of `scores` is expected to be greater than one."
optuna_grid.fit(X=x, y=y, sample_weight=w, score_param=score_param)

# Observe error when refit=True: TypeError: Ridge.fit() got an unexpected keyword argument 'score_param'
optuna_grid_params['refit'] = True
optuna_grid = OptunaSearchCV(
    estimator=Ridge().set_fit_request(sample_weight=True),
    **optuna_grid_params
)
optuna_grid.fit(X=x, y=y, sample_weight=w, score_param=score_param)

# Observe same error when refit is set back to false and number of n_jobs > 1
optuna_grid_params['refit'] = False
optuna_grid_params['n_jobs'] = 2
optuna_grid = OptunaSearchCV(
    estimator=Ridge().set_fit_request(sample_weight=True),
    **optuna_grid_params
)
optuna_grid.fit(X=x, y=y, sample_weight=w, score_param=score_param)


# Fail to observe error when custom metric returns NaN, although error_score='raise'

def custom_metric_with_nan(y: np.ndarray, y_pred: np.ndarray, sample_weight: np.ndarray, score_param: str, *args, **kwargs) -> float:
    return np.nan


optuna_grid_params['scoring'] = make_scorer(
    custom_metric_with_nan,
    greater_is_better=True,
).set_score_request(
    sample_weight=True,
    score_param=True
)
optuna_grid_params['n_jobs'] = 1
optuna_grid = OptunaSearchCV(
    estimator=Ridge().set_fit_request(sample_weight=True),
    **optuna_grid_params
)
optuna_grid.fit(X=x, y=y, sample_weight=w, score_param=score_param)
assert optuna_grid_params['error_score'] == 'raise'

Additional context (optional)

No response

@Kreol64 Kreol64 added the bug Something isn't working label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant