Weighted resource use, or plain concurrency limit #30

rouzwawi · 2016-12-17T21:19:15Z

The current implementation of resource usage for a Workflow is through a plain resource reference in the WorkflowConfiguration definition. This has the problem that it only allows for a unit resource usage, per workflow instance.

In general, we need to align on what we mean with "resource" and "concurrency". We have to treat these two concepts separately:

A general resource and resource usage mechanism (that we have today), need a better association mechanism from workflows to resources. There's however some amount of detail that has to be considered here
- One workflow instance can use several resources (e.g. available memory or cpu quota in a GCP project), and these have different magnitudes and units (GB, count). The current model is backwards as it requires the resource limit to be set in a unit that is normalized to 1 unit used per workflow instance. Instead, a more natural mechanism would be for the workflow to define how many of each resource it will consume.
- A workflow instance usually submits one or more processing jobs to various processing runtimes (Dataproc, Dataflow, Hadoop, etc), these might or might not happen concurrently in the workflow, depending on wiring. So a fixed resource use for the workflow instance is at best a "max resource use" definition.
A concurrency limit on a workflow instance is a very simple limit that does not reflect any real resource usage. It leaves it completely up to the user to figure out how a workflow instance relates to some (unknown to Styx) resources, and from that derive a concurrency limit.

The second, more simplified concurrency limit can be reduced to use the more general resource/use mechanism. And we can do it as an internal detail that does not leak to the user.

I like that we are taking the first approach, but we need to be aware of the model we're dealing with and make it better reflected in the user touch points of Styx.

One immediate change that we need to do is to change the definition of the resource usage association in the workflow definition to be an object rather than a plain string:

schedules:
  - id: example-workflow
    partitioning: hours
    resources:
      - id: nodes
        use: 32

The use field can default to 1 which is the current behaviour, but the schema having an object there will allow us to evolve the definition.

The text was updated successfully, but these errors were encountered:

danielnorberg · 2017-04-04T05:00:42Z

Seems like this might be an issue that we want to resolve soon.

An interesting question is whether it will be enough to trust the user-indicated resource usage values. As opposed to some other schedulers (e.g. borg) that are in control of the underlying job execution platform styx is currently not able to actually enforce the indicated resource usage limits. As preventing resource stock-outs in whole regions has become a pressing issue, maybe we need to think about how styx might gain this capability on e.g. gcp.

rouzwawi added the enhancement label Dec 17, 2016

rouzwawi mentioned this issue Apr 4, 2017

[wip] Graph based scheduler #71

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weighted resource use, or plain concurrency limit #30

Weighted resource use, or plain concurrency limit #30

rouzwawi commented Dec 17, 2016 •

edited

Loading

danielnorberg commented Apr 4, 2017

Weighted resource use, or plain concurrency limit #30

Weighted resource use, or plain concurrency limit #30

Comments

rouzwawi commented Dec 17, 2016 • edited Loading

danielnorberg commented Apr 4, 2017

rouzwawi commented Dec 17, 2016 •

edited

Loading