Skip to content
This repository has been archived by the owner on Jul 12, 2023. It is now read-only.

Weighted resource use, or plain concurrency limit #30

Open
rouzwawi opened this issue Dec 17, 2016 · 1 comment
Open

Weighted resource use, or plain concurrency limit #30

rouzwawi opened this issue Dec 17, 2016 · 1 comment

Comments

@rouzwawi
Copy link
Member

rouzwawi commented Dec 17, 2016

The current implementation of resource usage for a Workflow is through a plain resource reference in the WorkflowConfiguration definition. This has the problem that it only allows for a unit resource usage, per workflow instance.

In general, we need to align on what we mean with "resource" and "concurrency". We have to treat these two concepts separately:

  • A general resource and resource usage mechanism (that we have today), need a better association mechanism from workflows to resources. There's however some amount of detail that has to be considered here
    • One workflow instance can use several resources (e.g. available memory or cpu quota in a GCP project), and these have different magnitudes and units (GB, count). The current model is backwards as it requires the resource limit to be set in a unit that is normalized to 1 unit used per workflow instance. Instead, a more natural mechanism would be for the workflow to define how many of each resource it will consume.
    • A workflow instance usually submits one or more processing jobs to various processing runtimes (Dataproc, Dataflow, Hadoop, etc), these might or might not happen concurrently in the workflow, depending on wiring. So a fixed resource use for the workflow instance is at best a "max resource use" definition.
  • A concurrency limit on a workflow instance is a very simple limit that does not reflect any real resource usage. It leaves it completely up to the user to figure out how a workflow instance relates to some (unknown to Styx) resources, and from that derive a concurrency limit.

The second, more simplified concurrency limit can be reduced to use the more general resource/use mechanism. And we can do it as an internal detail that does not leak to the user.

I like that we are taking the first approach, but we need to be aware of the model we're dealing with and make it better reflected in the user touch points of Styx.

One immediate change that we need to do is to change the definition of the resource usage association in the workflow definition to be an object rather than a plain string:

schedules:
  - id: example-workflow
    partitioning: hours
    resources:
      - id: nodes
        use: 32

The use field can default to 1 which is the current behaviour, but the schema having an object there will allow us to evolve the definition.

@danielnorberg
Copy link
Contributor

Seems like this might be an issue that we want to resolve soon.

An interesting question is whether it will be enough to trust the user-indicated resource usage values. As opposed to some other schedulers (e.g. borg) that are in control of the underlying job execution platform styx is currently not able to actually enforce the indicated resource usage limits. As preventing resource stock-outs in whole regions has become a pressing issue, maybe we need to think about how styx might gain this capability on e.g. gcp.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants