Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ErrorWithStatus may not be setting the unit status correctly #34

Open
DnPlas opened this issue Mar 6, 2023 · 1 comment
Open

ErrorWithStatus may not be setting the unit status correctly #34

DnPlas opened this issue Mar 6, 2023 · 1 comment

Comments

@DnPlas
Copy link
Contributor

DnPlas commented Mar 6, 2023

When deploying the kubeflow bundle, at some point I got the following error in the juju debug-log:

unit-training-operator-0: 17:29:52 ERROR unit.training-operator/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 207, in <module>
    main(TrainingOperatorCharm)
  File "/var/lib/juju/agents/unit-training-operator-0/charm/venv/ops/main.py", line 436, in main
    framework.reemit()
  File "/var/lib/juju/agents/unit-training-operator-0/charm/venv/ops/framework.py", line 866, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-training-operator-0/charm/venv/ops/framework.py", line 931, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 189, in _on_install
    self._check_container_connection()
  File "./src/charm.py", line 135, in _check_container_connection
    raise ErrorWithStatus("Pod startup is not complete", MaintenanceStatus)
charmed_kubeflow_chisme.exceptions._with_status.ErrorWithStatus: Pod startup is not complete

The error suggests the unit should be in MaintenanceStatus, but instead was in ErrorStatus. Although this does not prevent the unit from going to active and idle, this behaviour is not what we are expecting.

Steps to reproduce

  1. Deploy this bundle
  2. Watch the logs for training-operator
  3. Watch the status of training-operator
  4. For a brief moment, the unit is in error status rather than maintenance
@ca-scribner
Copy link
Contributor

I don't know if this is a chisme bug or something to do with how training-operator imported things?

The message comes from here. The charmed_kubeflow_chisme... part in the error is just the type of exception being raised. But then I'd have expected the exception be caught here. I don't recall seeing this happen in other charms, but would be interesting to dig into. Maybe there's something wrong with how chisme lets people import these exceptions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants