A Workflow Management Service for EWMS
The WMS is both the central component and the external interface for the Event Workflow Management System (EWMS). This service:
- Processes requests for HTCondor-served, task-based workflows.
- Translates workflows into actionable instructions.
- Coordinates assignments among EWMS components.
- Manages workloads based on available resources and workflow status.
- Relays information (workflow statuses, history, etc.) to external parties on demand.
See Docs/
As described above, the WMS has several concurrent responsibilities. These actions can be outlined in the "story" of a workflow:
- The user requests a new workflow. The WMS translates this workflow into n task directives, m taskforces, and determines the number of required queues.
- The WMS requests p queues from the MQS:
- If the MQS indicates that resources are insufficient, the WMS waits and also requests any other pending workflows from the MQS.
- Otherwise/eventually, the MQS creates the queues and provides them to the WMS.
- The WMS makes tokens for any publicly accessible queues available to the user.
- The WMS marks the workflow's taskforce(s) as ready for the TMS.
- See
Taskforce.phase
- See
- When ready, the TMS initiates HTCondor jobs for the taskforce(s).
- The TMS relays live, aggregated runtime statuses to the WMS until the workflow's taskforces are completed.
- The user tells EWMS that the workflow has finished. The workflow is deactivated, and the TMS stops the associated taskforces.
This "story" is also detailed in request_workflow.py. However, this script may not suit all your needs. It is recommended to have a solid understanding of the user-facing API endpoints and objects.
Every workflow) originates from a JSON object using POST @ /v0/workflows. The following is an example of valid a request object (refer to the docs for other optional fields not seen here):
{
"public_queue_aliases": [
"input-queue",
"output-queue"
],
"tasks": [
{
"cluster_locations": [
"sub-2"
],
"input_queue_aliases": [
"input-queue"
],
"output_queue_aliases": [
"output-queue"
],
"task_image": "/cvmfs/icecube.opensciencegrid.org/containers/path/to/my-apptainer-container:1.2.3",
"task_args": "cp {{INFILE}} {{OUTFILE}}",
"n_workers": 1000,
"worker_config": {
"do_transfer_worker_stdouterr": true,
"max_worker_runtime": 600,
"n_cores": 1,
"priority": 99,
"worker_disk": "512M",
"worker_memory": "512M"
}
}
]
}
The task container is built from the user-provided image, specified by the workflow request object's task_image
, task_args
, and task_env
. The container runs within an EWMS Pilot instance on an HTCondor Execution Point (EP). For configuration and interaction with EWMS events, refer to the EWMS Pilot documentation.
The init container runs once on a worker before any task/event is processed. This is specified by the workflow request object's init_image
, init_args
, and init_env
. See the EWMS Pilot documentation for more information.
The workflow request object's fields are mostly persisted in similarly-named fields the TaskDirective
object. However, some are located in other places:
POST @ /v0/workflows Field |
Persisted Destination |
---|---|
task_image |
TaskDirective.task_image |
task_args |
TaskDirective.task_args |
task_env |
TaskDirective.task_env |
init_image |
TaskDirective.init_image |
init_args |
TaskDirective.init_args |
init_env |
TaskDirective.init_env |
pilot_config |
Taskforce.pilot_config |
worker_config |
Taskforce.worker_config |
n_workers |
Taskforce.n_workers |
Understanding the objects within the WMS (and EWMS) is key. The following REST endpoints allow users to retrieve with these objects.
What's a workflow?
- Get by ID: GET @ /v0/workflows/{workflow_id}
- Search by other criteria: POST @ /v0/query/workflows
What's a task directive?
- Get by ID: GET @ /v0/task-directives/{task_id}
- Search by other criteria: POST @ /v0/query/task-directives
What's a taskforce?
- Get by ID: GET @ /v0/taskforces/{taskforce_uuid}
- Search by other criteria: POST @ /v0/query/taskforces
The workflow is the highest-level object in the EWMS hierarchy. It consists of 1+ tasks, each described by a task directive. These tasks are connected by message queues, akin to nodes and edges in a graph. See object properties.
How is a workflow object retrieved?
The message queue transfers events to and from a task. Public message queues allow external event injection or retrieval. This flexibility supports creating workflows of various complexity (graph theory). Each message queue is identified by an ID and requires an authentication token for access.
An event is an object transferred via message queues. It is the most frequently occurring object in EWMS.
A task refers to the unique combination of a workflow instance, container image, runtime arguments, environment variables, etc.
The term task also has different meanings depending on the context within EWMS:
- User context: A task is a unit of work intended for parallelization.
- EWMS pilot context: A task is a runtime instance of the task container, applied to an inbound event from a message queue and potentially produces outbound events (akin to a mathematical function).
Due to this ambiguity, the task directive is considered a first-order object within the WMS.
The task directive represents the unique configuration of a task (WMS context) and its place within an EWMS workflow. This object is immutable, with task_id
as its primary key. See object properties.
How is a task directive object retrieved?
A taskforce is not explicitly created by the user. It serves as a two-way bridge between EWMS and HTCondor. A taskforce is created for each application of a task directive (N:1 mapping) and contains HTCondor compute instructions and runtime status information. Each taskforce is applied to a single HTCondor cluster, with a fixed number of workers. If more compute resources are needed, additional taskforces are created from the same task directive. See object properties.
How is a taskforce object retrieved?