Skip to content

EWMS's Workflow Management Service (WMS): An External Interface, Public Entrypoint

License

Notifications You must be signed in to change notification settings

Observation-Management-Service/ewms-workflow-management-service

Repository files navigation

GitHub release (latest by date including pre-releases) Lines of code GitHub issues GitHub pull requests

ewms-workflow-management-service

A Workflow Management Service for EWMS

The WMS is both the central component and the external interface for the Event Workflow Management System (EWMS). This service:

  • Processes requests for HTCondor-served, task-based workflows.
  • Translates workflows into actionable instructions.
  • Coordinates assignments among EWMS components.
  • Manages workloads based on available resources and workflow status.
  • Relays information (workflow statuses, history, etc.) to external parties on demand.

API Documentation

See Docs/

Overview

As described above, the WMS has several concurrent responsibilities. These actions can be outlined in the "story" of a workflow:

EWMS Workflow Lifetime

  1. The user requests a new workflow. The WMS translates this workflow into n task directives, m taskforces, and determines the number of required queues.
  2. The WMS requests p queues from the MQS:
    1. If the MQS indicates that resources are insufficient, the WMS waits and also requests any other pending workflows from the MQS.
    2. Otherwise/eventually, the MQS creates the queues and provides them to the WMS.
  3. The WMS makes tokens for any publicly accessible queues available to the user.
  4. The WMS marks the workflow's taskforce(s) as ready for the TMS.
  5. When ready, the TMS initiates HTCondor jobs for the taskforce(s).
  6. The TMS relays live, aggregated runtime statuses to the WMS until the workflow's taskforces are completed.
  7. The user tells EWMS that the workflow has finished. The workflow is deactivated, and the TMS stops the associated taskforces.

This "story" is also detailed in request_workflow.py. However, this script may not suit all your needs. It is recommended to have a solid understanding of the user-facing API endpoints and objects.

Example Workflow Request JSON

Every workflow) originates from a JSON object using POST @ /v0/workflows. The following is an example of valid a request object (refer to the docs for other optional fields not seen here):

{
    "public_queue_aliases": [
        "input-queue",
        "output-queue"
    ],
    "tasks": [
        {
            "cluster_locations": [
                "sub-2"
            ],
            "input_queue_aliases": [
                "input-queue"
            ],
            "output_queue_aliases": [
                "output-queue"
            ],
            "task_image": "/cvmfs/icecube.opensciencegrid.org/containers/path/to/my-apptainer-container:1.2.3",
            "task_args": "cp {{INFILE}} {{OUTFILE}}",
            "n_workers": 1000,
            "worker_config": {
                "do_transfer_worker_stdouterr": true,
                "max_worker_runtime": 600,
                "n_cores": 1,
                "priority": 99,
                "worker_disk": "512M",
                "worker_memory": "512M"
            }
        }
    ]
}

The Task Container

The task container is built from the user-provided image, specified by the workflow request object's task_image, task_args, and task_env. The container runs within an EWMS Pilot instance on an HTCondor Execution Point (EP). For configuration and interaction with EWMS events, refer to the EWMS Pilot documentation.

The Init Container

The init container runs once on a worker before any task/event is processed. This is specified by the workflow request object's init_image, init_args, and init_env. See the EWMS Pilot documentation for more information.

Locations of Persisted Attributes

The workflow request object's fields are mostly persisted in similarly-named fields the TaskDirective object. However, some are located in other places:

POST @ /v0/workflows Field Persisted Destination
task_image TaskDirective.task_image
task_args TaskDirective.task_args
task_env TaskDirective.task_env
init_image TaskDirective.init_image
init_args TaskDirective.init_args
init_env TaskDirective.init_env
pilot_config Taskforce.pilot_config
worker_config Taskforce.worker_config
n_workers Taskforce.n_workers

Interacting with First-Order Objects using API Endpoints

Understanding the objects within the WMS (and EWMS) is key. The following REST endpoints allow users to retrieve with these objects.

Get a Workflow

What's a workflow?

Get a Task Directive

What's a task directive?

Get a Taskforce

What's a taskforce?

EWMS Glossary Applied to the WMS

Workflow

The workflow is the highest-level object in the EWMS hierarchy. It consists of 1+ tasks, each described by a task directive. These tasks are connected by message queues, akin to nodes and edges in a graph. See object properties.

How is a workflow object retrieved?

Message Queue

The message queue transfers events to and from a task. Public message queues allow external event injection or retrieval. This flexibility supports creating workflows of various complexity (graph theory). Each message queue is identified by an ID and requires an authentication token for access.

Event

An event is an object transferred via message queues. It is the most frequently occurring object in EWMS.

Task

A task refers to the unique combination of a workflow instance, container image, runtime arguments, environment variables, etc.

The term task also has different meanings depending on the context within EWMS:

  • User context: A task is a unit of work intended for parallelization.
  • EWMS pilot context: A task is a runtime instance of the task container, applied to an inbound event from a message queue and potentially produces outbound events (akin to a mathematical function).

Due to this ambiguity, the task directive is considered a first-order object within the WMS.

Task Directive

The task directive represents the unique configuration of a task (WMS context) and its place within an EWMS workflow. This object is immutable, with task_id as its primary key. See object properties.

How is a task directive object retrieved?

Taskforce

A taskforce is not explicitly created by the user. It serves as a two-way bridge between EWMS and HTCondor. A taskforce is created for each application of a task directive (N:1 mapping) and contains HTCondor compute instructions and runtime status information. Each taskforce is applied to a single HTCondor cluster, with a fixed number of workers. If more compute resources are needed, additional taskforces are created from the same task directive. See object properties.

How is a taskforce object retrieved?

About

EWMS's Workflow Management Service (WMS): An External Interface, Public Entrypoint

Resources

License

Stars

Watchers

Forks

Packages