You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 12, 2023. It is now read-only.
The current implementation of resource usage for a Workflow is through a plain resource reference in the WorkflowConfiguration definition. This has the problem that it only allows for a unit resource usage, per workflow instance.
In general, we need to align on what we mean with "resource" and "concurrency". We have to treat these two concepts separately:
A general resource and resource usage mechanism (that we have today), need a better association mechanism from workflows to resources. There's however some amount of detail that has to be considered here
One workflow instance can use several resources (e.g. available memory or cpu quota in a GCP project), and these have different magnitudes and units (GB, count). The current model is backwards as it requires the resource limit to be set in a unit that is normalized to 1 unit used per workflow instance. Instead, a more natural mechanism would be for the workflow to define how many of each resource it will consume.
A workflow instance usually submits one or more processing jobs to various processing runtimes (Dataproc, Dataflow, Hadoop, etc), these might or might not happen concurrently in the workflow, depending on wiring. So a fixed resource use for the workflow instance is at best a "max resource use" definition.
A concurrency limit on a workflow instance is a very simple limit that does not reflect any real resource usage. It leaves it completely up to the user to figure out how a workflow instance relates to some (unknown to Styx) resources, and from that derive a concurrency limit.
The second, more simplified concurrency limit can be reduced to use the more general resource/use mechanism. And we can do it as an internal detail that does not leak to the user.
I like that we are taking the first approach, but we need to be aware of the model we're dealing with and make it better reflected in the user touch points of Styx.
One immediate change that we need to do is to change the definition of the resource usage association in the workflow definition to be an object rather than a plain string:
Seems like this might be an issue that we want to resolve soon.
An interesting question is whether it will be enough to trust the user-indicated resource usage values. As opposed to some other schedulers (e.g. borg) that are in control of the underlying job execution platform styx is currently not able to actually enforce the indicated resource usage limits. As preventing resource stock-outs in whole regions has become a pressing issue, maybe we need to think about how styx might gain this capability on e.g. gcp.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The current implementation of resource usage for a Workflow is through a plain resource reference in the
WorkflowConfiguration
definition. This has the problem that it only allows for a unit resource usage, per workflow instance.In general, we need to align on what we mean with "resource" and "concurrency". We have to treat these two concepts separately:
The second, more simplified concurrency limit can be reduced to use the more general resource/use mechanism. And we can do it as an internal detail that does not leak to the user.
I like that we are taking the first approach, but we need to be aware of the model we're dealing with and make it better reflected in the user touch points of Styx.
One immediate change that we need to do is to change the definition of the resource usage association in the workflow definition to be an object rather than a plain string:
The
use
field can default to 1 which is the current behaviour, but the schema having an object there will allow us to evolve the definition.The text was updated successfully, but these errors were encountered: