-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sub]feat: modify computetask failure report #727
[sub]feat: modify computetask failure report #727
Conversation
backend/substrapp/migrations/0012_alter_algo_description_alter_algo_file_and_more.py
Show resolved
Hide resolved
610074e
to
96569a1
Compare
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
7e6b96d
to
9ab7ec3
Compare
/e2e --benchmarks mnist --refs orchestrator=feat/modify-computetask-failure-report |
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major change proposal.
if report.asset_type == asset_failure_report.FailedAssetKind.FAILED_ASSET_FUNCTION: | ||
asset_class = Function | ||
else: | ||
asset_class = ComputeTask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick (especially given we don't have much time), but couldn't/shouldn't we have a FailedAssetKind -> Model mapping to avoid this else that considers anything not a function is a CP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth opening not to forget about this suggestion in the refactoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Sarah, as internal errors are still considered to be part of Compute tasks
backend/substrapp/tasks/task.py
Outdated
|
||
close_old_connections() | ||
|
||
# Celery does not provide unpacked arguments, we are doing it in `get_task_info` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the comments about unpacking celery arguments and the relation to get_task_info
, isn't that what split_args
is doing?
Maybe this comment is an unfortunate duplicate from the one in backend/substrapp/tasks/tasks_save_image.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💚 green CI 💚 🤩
Signed-off-by: Guilhem Barthes <[email protected]>
Signed-off-by: Guilhem Barthes <[email protected]>
## Companion PR - Substra/substra-backend#727 - Substra/substra-frontend#240 ## Description Modify `FailureReport`: - add field `asset_type` containing the kind of asset the failure report connect to - rename `compute_task_key` to `asset_key`, which is a [wire compatible change](https://groups.google.com/g/protobuf/c/hX4Mj0P4N0w) (i.e. does not need to be declared as a new field) ## How has this been tested? As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. The e2e tests are also broken due to an issue on producing dumps during release, but passed locally. ## Checklist - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Guilhem Barthes <[email protected]>
## Companion PR - Substra/substra-backend#727 - Substra/substra-frontend#240 ## Description Modify `FailureReport`: - add field `asset_type` containing the kind of asset the failure report connect to - rename `compute_task_key` to `asset_key`, which is a [wire compatible change](https://groups.google.com/g/protobuf/c/hX4Mj0P4N0w) (i.e. does not need to be declared as a new field) ## How has this been tested? As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. The e2e tests are also broken due to an issue on producing dumps during release, but passed locally. ## Checklist - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Guilhem Barthes <[email protected]> Signed-off-by: Guilhem Barthés <[email protected]>
## Companion PR - Substra/orchestrator#277 - Substra/substra-frontend#240 ## Description The aim is to allow registering failure reports not only for compute task but for other kind of assets (for now, functions which are not building as part of the execution of a compute task) - Modifies `ComputeTaskFailureReport`: - renamed the model to `AssetFailureReport` - renamed field `compute_task_key` to `asset_key` (as we can now have a function key) - added field `asset_type` to provide - Updates protobuf reflecting the previous changes - refactor `download_file` in `PermissionMixin` to provide mroe flexibility (and decouple from DRF) - create new `FailableTask` (Celery task): - centralize the logic to submit asset failure ## How has this been tested? As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. ## Checklist - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Guilhem Barthes <[email protected]>
## Companion PR - Substra/orchestrator#277 - Substra/substra-frontend#240 ## Description The aim is to allow registering failure reports not only for compute task but for other kind of assets (for now, functions which are not building as part of the execution of a compute task) - Modifies `ComputeTaskFailureReport`: - renamed the model to `AssetFailureReport` - renamed field `compute_task_key` to `asset_key` (as we can now have a function key) - added field `asset_type` to provide - Updates protobuf reflecting the previous changes - refactor `download_file` in `PermissionMixin` to provide mroe flexibility (and decouple from DRF) - create new `FailableTask` (Celery task): - centralize the logic to submit asset failure ## How has this been tested? As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. ## Checklist - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Guilhem Barthes <[email protected]>
## Companion PR - Substra/orchestrator#277 - Substra/substra-frontend#240 ## Description The aim is to allow registering failure reports not only for compute task but for other kind of assets (for now, functions which are not building as part of the execution of a compute task) - Modifies `ComputeTaskFailureReport`: - renamed the model to `AssetFailureReport` - renamed field `compute_task_key` to `asset_key` (as we can now have a function key) - added field `asset_type` to provide - Updates protobuf reflecting the previous changes - refactor `download_file` in `PermissionMixin` to provide mroe flexibility (and decouple from DRF) - create new `FailableTask` (Celery task): - centralize the logic to submit asset failure ## How has this been tested? As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. ## Checklist - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Guilhem Barthes <[email protected]>
## Companion PR - Substra/orchestrator#277 - Substra/substra-frontend#240 ## Description The aim is to allow registering failure reports not only for compute task but for other kind of assets (for now, functions which are not building as part of the execution of a compute task) - Modifies `ComputeTaskFailureReport`: - renamed the model to `AssetFailureReport` - renamed field `compute_task_key` to `asset_key` (as we can now have a function key) - added field `asset_type` to provide - Updates protobuf reflecting the previous changes - refactor `download_file` in `PermissionMixin` to provide mroe flexibility (and decouple from DRF) - create new `FailableTask` (Celery task): - centralize the logic to submit asset failure ## How has this been tested? As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. ## Checklist - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Guilhem Barthes <[email protected]>
## Companion PR - Substra/orchestrator#277 - Substra/substra-frontend#240 ## Description The aim is to allow registering failure reports not only for compute task but for other kind of assets (for now, functions which are not building as part of the execution of a compute task) - Modifies `ComputeTaskFailureReport`: - renamed the model to `AssetFailureReport` - renamed field `compute_task_key` to `asset_key` (as we can now have a function key) - added field `asset_type` to provide - Updates protobuf reflecting the previous changes - refactor `download_file` in `PermissionMixin` to provide mroe flexibility (and decouple from DRF) - create new `FailableTask` (Celery task): - centralize the logic to submit asset failure ## How has this been tested? As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. ## Checklist - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Guilhem Barthes <[email protected]>
## Companion PR - Substra/orchestrator#277 - Substra/substra-frontend#240 ## Description The aim is to allow registering failure reports not only for compute task but for other kind of assets (for now, functions which are not building as part of the execution of a compute task) - Modifies `ComputeTaskFailureReport`: - renamed the model to `AssetFailureReport` - renamed field `compute_task_key` to `asset_key` (as we can now have a function key) - added field `asset_type` to provide - Updates protobuf reflecting the previous changes - refactor `download_file` in `PermissionMixin` to provide mroe flexibility (and decouple from DRF) - create new `FailableTask` (Celery task): - centralize the logic to submit asset failure ## How has this been tested? As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. ## Checklist - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Guilhem Barthes <[email protected]>
## Companion PR - Substra/orchestrator#277 - Substra/substra-frontend#240 ## Description The aim is to allow registering failure reports not only for compute task but for other kind of assets (for now, functions which are not building as part of the execution of a compute task) - Modifies `ComputeTaskFailureReport`: - renamed the model to `AssetFailureReport` - renamed field `compute_task_key` to `asset_key` (as we can now have a function key) - added field `asset_type` to provide - Updates protobuf reflecting the previous changes - refactor `download_file` in `PermissionMixin` to provide mroe flexibility (and decouple from DRF) - create new `FailableTask` (Celery task): - centralize the logic to submit asset failure ## How has this been tested? As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. ## Checklist - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Guilhem Barthes <[email protected]>
## Companion PR - Substra/orchestrator#277 - Substra/substra-frontend#240 ## Description The aim is to allow registering failure reports not only for compute task but for other kind of assets (for now, functions which are not building as part of the execution of a compute task) - Modifies `ComputeTaskFailureReport`: - renamed the model to `AssetFailureReport` - renamed field `compute_task_key` to `asset_key` (as we can now have a function key) - added field `asset_type` to provide - Updates protobuf reflecting the previous changes - refactor `download_file` in `PermissionMixin` to provide mroe flexibility (and decouple from DRF) - create new `FailableTask` (Celery task): - centralize the logic to submit asset failure ## How has this been tested? As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. ## Checklist - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Guilhem Barthes <[email protected]>
* feat: decouple image builder from worker Signed-off-by: SdgJlbl <[email protected]> * fix: update skaffold config Signed-off-by: Guilhem Barthes <[email protected]> * feat: add `ServiceAccount` and modify role Signed-off-by: Guilhem Barthes <[email protected]> * feat: build image in new pod Signed-off-by: Guilhem Barthes <[email protected]> * chore: rename `deployment-builder.yaml` to `stateful-builder.yaml` Signed-off-by: Guilhem Barthes <[email protected]> * chore: rename `stateful-builder.yaml` to `statefulset-builder.yaml` Signed-off-by: Guilhem Barthes <[email protected]> * chore: centralize params Signed-off-by: Guilhem Barthes <[email protected]> * feat: create `BuildTask` Signed-off-by: Guilhem Barthes <[email protected]> * feat: move more code to `builder` Signed-off-by: Guilhem Barthes <[email protected]> * fix: remove TaskProfiling as Celery task + save Entrypoint in DB Signed-off-by: SdgJlbl <[email protected]> * feat: build function at registration (#707) <!-- Please reference issue if any. --> <!-- Please include a summary of your changes. --> <!-- Please describe the tests that you ran to verify your changes. --> - [ ] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: SdgJlbl <[email protected]> Signed-off-by: Guilhem Barthes <[email protected]> Co-authored-by: SdgJlbl <[email protected]> * feat: share images between backends (#708) Signed-off-by: SdgJlbl <[email protected]> * chore: update helm worklfow Signed-off-by: ThibaultFy <[email protected]> * [sub]fix: add missing migration poc (#728) ## Description Add a migration missing in the poc. This migration alters two things: - modify `ComputeTaskFailureReport.logs` - modify `FunctionImage.file` This migration has been generated automatically with `make migrations` ## How has this been tested? <!-- Please describe the tests that you ran to verify your changes. --> ## Checklist - [ ] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated Signed-off-by: Guilhem Barthes <[email protected]> * [sub]feat: add function events (#714) - Substra/orchestrator#263 Add function events, used now we decoupled the building of the function with the execution of the compute task. For that it add a status field on the Function. It also includes another PR (merged here), to have functions build logs working again. In a future PR, we will change the compute task execution to avoid having to wait_for_function_built in compute_task() Fixes FL-1160 As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: SdgJlbl <[email protected]> Signed-off-by: Guilhem Barthes <[email protected]> Signed-off-by: Guilhem Barthés <[email protected]> Co-authored-by: SdgJlbl <[email protected]> * [sub]fix(app/orchestrator/resources): FunctionStatus.FUNCTION_STATUS_CREATED -> FunctionStatus.FUNCTION_STATUS_WAITING (#742) # Issue Backend FunctionStatus are not aligned with [orchestrator definitions](https://github.com/Substra/orchestrator/blob/poc-decoupled-builder/lib/asset/function.proto#L29-L36). In particular, `FunctionStatus.FUNCTION_STATUS_CREATED` leading to the following error: ```txt ValueError: 'FUNCTION_STATUS_WAITING' is not a valid FunctionStatus ``` ## Description FunctionStatus.FUNCTION_STATUS_CREATED -> FunctionStatus.FUNCTION_STATUS_WAITING ## How has this been tested? Running Camelyon benchmark on [poc-builder-flpc](https://substra.org-1.poc-builder-flpc.cg.owkin.tech/compute_plans/a420306f-5719-412b-ab9c-688b7bed9c70/tasks?page=1&ordering=-rank) environment. ## Checklist - [ ] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Thibault Camalon <[email protected]> * fix: rebase changelog Signed-off-by: Guilhem Barthés <[email protected]> * feat: decouple image builder from worker Signed-off-by: SdgJlbl <[email protected]> * feat: add `ServiceAccount` and modify role Signed-off-by: Guilhem Barthes <[email protected]> * feat: build function at registration (#707) <!-- Please reference issue if any. --> <!-- Please include a summary of your changes. --> <!-- Please describe the tests that you ran to verify your changes. --> - [ ] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: SdgJlbl <[email protected]> Signed-off-by: Guilhem Barthes <[email protected]> Co-authored-by: SdgJlbl <[email protected]> * feat: save status update in orc Signed-off-by: Guilhem Barthes <[email protected]> * feat: use status for build waiting Signed-off-by: Guilhem Barthes <[email protected]> * fix: re-add `container_image_exists` Signed-off-by: Guilhem Barthes <[email protected]> * fix: rebase errors Signed-off-by: Guilhem Barthes <[email protected]> * fix: format Signed-off-by: Guilhem Barthes <[email protected]> * fix: tests Signed-off-by: Guilhem Barthes <[email protected]> * fix: add `si` to building invokations Signed-off-by: Guilhem Barthes <[email protected]> * fix: tests Signed-off-by: Guilhem Barthes <[email protected]> * fix: apply feedback Signed-off-by: Guilhem Barthes <[email protected]> * fix: only import during typing Signed-off-by: Guilhem Barthes <[email protected]> * [sub]feat: modify computetask failure report (#727) ## Companion PR - Substra/orchestrator#277 - Substra/substra-frontend#240 ## Description The aim is to allow registering failure reports not only for compute task but for other kind of assets (for now, functions which are not building as part of the execution of a compute task) - Modifies `ComputeTaskFailureReport`: - renamed the model to `AssetFailureReport` - renamed field `compute_task_key` to `asset_key` (as we can now have a function key) - added field `asset_type` to provide - Updates protobuf reflecting the previous changes - refactor `download_file` in `PermissionMixin` to provide mroe flexibility (and decouple from DRF) - create new `FailableTask` (Celery task): - centralize the logic to submit asset failure ## How has this been tested? As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main. ## Checklist - [x] [changelog](../CHANGELOG.md) was updated with notable changes - [ ] documentation was updated --------- Signed-off-by: Guilhem Barthes <[email protected]> * feat: add config to run celery in tests Signed-off-by: Guilhem Barthés <[email protected]> * feat: add tests Signed-off-by: Guilhem Barthés <[email protected]> * fix: remove rebqse duplicate Signed-off-by: Guilhem Barthés <[email protected]> * docs: changelog Signed-off-by: Guilhem Barthés <[email protected]> * fix: adapt to pydantic 2.x.x Signed-off-by: Guilhem Barthés <[email protected]> * fix: remove rebase artifacts Signed-off-by: Guilhem Barthés <[email protected]> * fix: update to pydantic 2.x.x Signed-off-by: Guilhem Barthés <[email protected]> --------- Signed-off-by: SdgJlbl <[email protected]> Signed-off-by: Guilhem Barthes <[email protected]> Signed-off-by: ThibaultFy <[email protected]> Signed-off-by: Guilhem Barthés <[email protected]> Signed-off-by: Thibault Camalon <[email protected]> Co-authored-by: SdgJlbl <[email protected]> Co-authored-by: ThibaultFy <[email protected]> Co-authored-by: Thibault Camalon <[email protected]>
Companion PR
Description
The aim is to allow registering failure reports not only for compute task but for other kind of assets (for now, functions which are not building as part of the execution of a compute task)
ComputeTaskFailureReport
:AssetFailureReport
compute_task_key
toasset_key
(as we can now have a function key)asset_type
to providedownload_file
inPermissionMixin
to provide mroe flexibility (and decouple from DRF)FailableTask
(Celery task):How has this been tested?
As this is going to be merged on a branch that is going to be merged to a POC branch, we use MNIST as a baseline of a working model. We will deal with failing tests on the POC before merging on main.
Checklist