Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuration of max attempts for a task #2276

Open
dhpikolo opened this issue Feb 18, 2025 · 2 comments · May be fixed by #2279
Open

Allow configuration of max attempts for a task #2276

dhpikolo opened this issue Feb 18, 2025 · 2 comments · May be fixed by #2279

Comments

@dhpikolo
Copy link

dhpikolo commented Feb 18, 2025

Currently, a user can attempt to run a specific task up to a maximum of 6 times. It would be beneficial to make this value configurable.

In our use case, we are working on integrating Argo retries with Metaflow’s retried Argo workflows. This environment variable would allow us to set a limit on how many times a user can retry an Argo workflow.

That said, beyond our specific use case, adding this configuration flexibility would be generally useful.

Current Behaviour

import pandas as pd
from metaflow import (
    FlowSpec,
    Parameter,
    card,
    project,
    step,
    retry
)


@project(name="dummy_project")
class HelloWorld(FlowSpec):
    force_error = Parameter("force-error", type=bool, default=False)

    @card
    @step
    def start(self):
        print("something")
        self.next(self.end)

    @card
    @retry(times=10)
    @step
    def end(self):
        if self.force_error:
            raise Exception("Testing errors in metaflow")
        print(f"the data artifact is: {self.my_var}")


if __name__ == "__main__":
    HelloWorld()
  • Running the above flow locally via python hello_world.py run throws the following exception
Metaflow 2.14.0 executing HelloWorld for user:j.kollipara
Project: dummy_project, Branch: user.j.kollipara
Validating your flow...
    The graph looks good!
Running pylint...
    Pylint is happy!
    Flow failed:
    The maximum number of retries is @retry(times=4).

error: Recipe `_poetry-run` failed with exit code 1

Source code of the above error:

def step_init(self, flow, graph, step, decos, environment, flow_datastore, logger):
# The total number of attempts must not exceed MAX_ATTEMPTS.
# attempts = normal task (1) + retries (N) + @catch fallback (1)
if int(self.attributes["times"]) + 2 > MAX_ATTEMPTS:
raise MetaflowException(
"The maximum number of retries is "
"@retry(times=%d)." % (MAX_ATTEMPTS - 2)
)

Proposed Behaviour

Setting METAFLOW_MAX_ATTEMPTS=12 would allow users to run the above flow.

@dhpikolo
Copy link
Author

dhpikolo commented Feb 18, 2025

I have already put up a PR with the proposed change, let me know what you guys would think of it.

@dhpikolo dhpikolo linked a pull request Feb 19, 2025 that will close this issue
@dhpikolo
Copy link
Author

Created a new PR, since the old PR was based on development branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant