Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearml logging callback triggering even if clearml not being used #649

Open
rminsil opened this issue Feb 7, 2025 · 0 comments
Open

Clearml logging callback triggering even if clearml not being used #649

rminsil opened this issue Feb 7, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@rminsil
Copy link
Collaborator

rminsil commented Feb 7, 2025

Bug Summary

The transformers library we use automatically detects if clearml is installed, and then registers a ClearMLCallback object into its event handling system which sends logs to clearml.

This happens even if you're training models locally and not using clearml in any way.

There's two potential issues:

  • I am not sure if the team is aware that this kind of logging is happening and what implications come from that
  • people wanting to run models locally can't do this if their local has clearml installed but not configured correctly

Details

The hugging face transformer callback docs outline how the training loop supports registering callbacks to execute at different points in the training.

The important part is:

Image

It auto-magically detects if common frameworks are installed and then registers callbacks for them.

In the case of clearml, I suspect it's detecting if clearml is installed by looking for a clearml python package on the python path. We do have this library installed by poetry:

[tool.poetry.dependencies]
...
clearml = ">=1.4.1"

If my theory is right, then this would impact anyone using poetry.

Example error

I noticed this when trying to run the model training script locally. I haven't yet setup clearml but I have installed poetry.

$ export SIL_NLP_DATA_PATH=~/sil/tasks/2025-01-23-local-training/NLP
$ poetry run python -m silnlp.nmt.train  Philippines/ABP/2025-01-30-Exp02-isolate-luke-for-testing

...
Traceback (most recent call last):
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/me/sil/dev/silnlp/silnlp/nmt/train.py", line 42, in <module>
    main()
  File "/home/me/sil/dev/silnlp/silnlp/nmt/train.py", line 34, in main
    model.train()
  File "/home/me/sil/dev/silnlp/silnlp/nmt/hugging_face_config.py", line 1026, in train
    train_result = trainer.train(resume_from_checkpoint=last_checkpoint)
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/transformers/trainer.py", line 2123, in train
    return inner_training_loop(
  File "/home/me/sil/dev/silnlp/silnlp/nmt/hugging_face_config.py", line 1904, in _inner_training_loop
    return inner_training_loop(
  File "/home/me/sil/dev/silnlp/silnlp/nmt/hugging_face_config.py", line 1984, in decorator
    return function(batch_size, *args, **kwargs)
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/transformers/trainer.py", line 2382, in _inner_training_loop
    self.control = self.callback_handler.on_train_begin(args, self.state, self.control)
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/transformers/trainer_callback.py", line 468, in on_train_begin
    return self.call_event("on_train_begin", args, state, control)
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/transformers/trainer_callback.py", line 518, in call_event
    result = getattr(callback, event)(
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/transformers/integrations/integration_utils.py", line 1869, in on_train_begin
    self.setup(args, state, model, tokenizer, **kwargs)
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/transformers/integrations/integration_utils.py", line 1792, in setup
    self._clearml_task = self._clearml.Task.init(
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/clearml/task.py", line 596, in init
    task = cls._create_dev_task(
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/clearml/task.py", line 3956, in _create_dev_task
    task = cls(
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/clearml/task.py", line 211, in __init__
    super(Task, self).__init__(**kwargs)
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 167, in __init__
    super(Task, self).__init__(id=task_id, session=session, log=log)
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/clearml/backend_interface/base.py", line 149, in __init__
    super(IdObjectBase, self).__init__(session, log, **kwargs)
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/clearml/backend_interface/base.py", line 41, in __init__
    self._session = session or self._get_default_session()
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/clearml/backend_interface/base.py", line 119, in _get_default_session
    InterfaceBase._default_session = Session(
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/clearml/backend_api/session/session.py", line 161, in __init__
    self._connect()
  File "/home/me/.miniconda3/envs/silnlp/lib/python3.10/site-packages/clearml/backend_api/session/session.py", line 224, in _connect
    raise MissingConfigError()
clearml.backend_api.session.defs.MissingConfigError: It seems ClearML is not configured on this machine!
To get started with ClearML, setup your own 'clearml-server' or create a free account at https://app.clear.ml
Setup instructions can be found here: https://clear.ml/docs

My hack

I got around this by overriding the "report_to" setting:

def _create_training_arguments(self) -> Seq2SeqTrainingArguments:
    parser = HfArgumentParser(Seq2SeqTrainingArguments)
    args: dict = {}
    ...
    # Temp hack to stop it trying to log to clearml when I'm running locally
    args["report_to"] = [] # <----------------------------------
    return parser.parse_dict(args)[0]

Ideal behavior

From my brief chat with David, he suggested that the ideal behavior is for this logging to be controlled by the inputs to the program, not by what happens to be installed on the system. That makes it more deterministic and portable.

@rminsil rminsil added the bug Something isn't working label Feb 7, 2025
@ddaspit ddaspit moved this from 🆕 New to 📋 Backlog in SIL-NLP Research Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant