Víðarr uses plugins to allow interaction with a diverse set of systems. Plugins
are loaded by the Java
ServiceLoader
and exported by the Java module
system
using the provides
keyword. All plugins need to depend only on the
ca.on.oicr.gsi.vidarr.pluginapi
module to hook into the Víðarr
infrastructure.
There are several services that a plugin can provide and a plugin is free to
provide multiple. Plugins are loaded from JSON data in the Vidarr configuration
file or, in the case of an unload filter, user requests, using Jackson. Each
plugin can load whatever Jackson-compatible data from JSON it requires. Each
plugin has a small "provider" class which provides type information for
Jackson. In the JSON file, the "type"
attribute will be used to create the
appropriate class instance. The provider class lists what values for "type"
correspond to what Java objects that Jackson should load. Since objects are
instantiated by Jackson, most have a startup
method that is called after
loading is complete where the plugin can do any initialisation required. If it
throws exceptions, the Vidarr server will fail to start, which is probably the
correct behaviour for a badly misconfigured plugin.
As an example of a configuration file:
"consumableResources": {
"total": {
"maximum": 500,
"type": "max-in-flight"
}
}
The "type": "max-in-flight"
property is used to connect this configuration to
ca.on.oicr.gsi.vidarr.core.MaxInFlightConsumableResource
. The "maximum"
property is populated by Jackson into an instance of that class. Here "total"
is an arbitrary name set by the server administrator they will use in the
"targets"
section of the main configuration file.
These are high-level overviews of the purpose and general constraints for each service. The JavaDoc for each interface provides the details for how the interfaces should behave.
Additionally, plugins communicate with the outside world through the types they
expect. A description of the types is provided in
ca.on.oicr.gsi.vidarr.SimpleType
and the format for the values is meant to be
compatible with Shesmu's.
Plugins are meant to run asynchronously. Most plugins are given a WorkMonitor
instance which allows a plugin to communicate back to Víðarr and schedule
future asynchronous tasks. Plugins must implement recovery from crash, so are
expected to journal their current state to the database. The WorkMonitor
provides methods to journal state to the database for crash recovery and to
provide status information to users.
Most plugins have a recover
method. If Vidarr is restarted, the plugin will
be asked to recover its state from the last state information in journaled to
the database using the WorkMonitor
. Plugins are expected to be able to pick
up where they left off based only on this information.
See Víðarr Code Style for preferred code formatting.
Consumable resources implement
ca.on.oicr.gsi.vidarr.ConsumableResourceProvider
and
ca.on.oicr.gsi.vidarr.ConsumableResource
. These plugins are responsible for
delaying workflow run execution until resources are available.
The plugins can be associated with targets in the server configuration. Consumable resources may request that submitters provide information or operate on the existence of a workflow run. Consumable resources is a broad term for anything that can be used to delay a workflow run from launching. Some of them are "quota"-type resources (such as RAM, disk, max-in-flight) where the resources must be available at the start of its run and it holds the resource until the workflow completes (successfully or not), at which point the resource may be reused by another workflow run. Within quota-type, some require information (e.g., the amount of RAM), while others are based purely on the existence of the workflow run (e.g., max-in-flight). The priority consumable resources operates within the restrictions imposed from quota resources and allows users to manually set the order in which workflow runs will launch. Other resource are more "throttling"-type. These include maintenance schedules and Prometheus alerts which block workflow runs from starting but don't track anything once the workflow run is underway.
Consumable resources are long-running. Whenever Vidarr attempts to run a
workflow, it will consult the consumable resources to see if there is capacity
to run the workflow (the request
method). At that point the consumable
resource must make a decision as to whether the workflow can proceed. Once the
workflow has finished running (successfully or not), Vidarr will release
the
resource so that it can be used again. When Vidarr restarts, any running
workflows will be called with recover
to indicate that the resource is being
used and the resource cannot stop the workflow even if the resource is
over-capacity.
Consumable resources can request data from the user, if desired. The
inputFromSubmitter
can return an empty optional to indicate that no
information is required or can indicate the name and type of information that
is required. The request
and release
methods will contain a copy of this information,
encoded as JSON, if the submitter provided it. The JSON data has been
type-checked by Vidarr, so it should be safe to convert to the expected type
using Jackson.
Sometimes, consumable resources are doing scoring that would be helpful to know
for debugging purposes. In that case, the resource can return a custom
ConsumableResourceResponse
that uses the Visitor.set
method to export
numeric statistics that will be available in the "tracing"
property. Víðarr
will prefix these variables with the consumable resource's name.
Input provisioners implement ca.on.oicr.gsi.vidarr.InputProvisionerProvider
and ca.on.oicr.gsi.vidarr.InputProvisioner
. These plugins are responsible for
taking files from existing workflows or provided by the user and generating a
file path that a workflow can use. Input provisioners can choose to handle only
some kinds of input data (files vs directories) and the system administrator
can choose multiple provisoners to handle both.
This plugin and the workflow plugins must have a mutual understanding of what a
file path means. That is somewhat the responsibility of the system
administrator. For instance, if in an HPC environment with shared disk, the
system administrator must direct the input provisioner plugin to write to a
shared directory instead of, say, /tmp
and ensure the right permissions are
set up.
These are not the responsibility of the plugin author.
The class BaseJsonInputProvisioner
is a partial implementation that can store
crash recovery information in a JSON object of the implementor's choosing,
making recovery easier.
Output provisioners implement ca.on.oicr.gsi.vidarr.OutputProvisionerProvider
and ca.on.oicr.gsi.vidarr.OutputProvisioner
. These plugins are responsible
for taking data (files or JSON values) from completed workflows, moving the
data into permanent storage and writing back a file path or URL that will be
associated with the correct external identifiers. Output provisioners can
choose to handle only some kinds of output data (files, logs, data-warehouse
entries, or QC judgements) and the system administrator can choose multiple
provisioners to handle all the input types they require.
Output provisioners are run twice for each workflow: a preflight and a provision out. The preflight is run before the workflow has started and allows the plugin to validate any configuration metadata provided by the submitter (i.e., Shesmu) to check it for validity. Once the workflow is completed, the provision out step will be run with the metadata provided by the submitter and the output provided by the workflow.
The class BaseJsonOutputProvisioner
is a partial implementation that can
store crash recovery information in a JSON object of the implementer's
choosing, making recovery easier.
Runtime provisioners implement ca.on.oicr.gsi.vidarr.RuntimeProvisionerProvider
and ca.on.oicr.gsi.vidarr.RuntimeProvisioner
. These plugins are responsible for
extracting non-specific output from a workflow. While output provisioners are
fed specific data from a workflow (e.g., output file), runtime provisioners
operate on the workflow run as a whole. They can provision out information such
as performance metrics, workflow run logs, or machine statistics.
This plugin and the workflow plugins must have a mutual understanding of what a workflow engine's identifier means. That is somewhat the responsibility of the system administrator.
The class BaseJsonRuntimeProvisioner
is a partial implementation that can
store crash recovery information in a JSON object of the implementer's
choosing, making recovery easier.
Workflow engines implement ca.on.oicr.gsi.vidarr.WorkflowEngineProvider
and ca.on.oicr.gsi.vidarr.WorkflowEngine
. These plugins are responsible for
running workflows and collecting the output from the workflow. A workflow
engine can support multiple languages (see
ca.on.oicr.gsi.vidarr.WorkflowLanguage
for a complete list) and indicates
which ones are allowed via the supports
method.
The workflow engine will be given the complete input to the workflow (with real paths provided by the input provisioners) and the workflow itself. Once the workflow has completed, it must provide a JSON structure that references the output of the workflow. Vidarr will identify the output files generated by the workflow engine and they will be passed to the output provisioners.
After the output provisioners have completed, the workflow engine will be called again to cleanup any output, if this is appropriate. If the workflow engine does not support cleanup, it should gracefully succeed during the clean-up (and clean-up recovery) methods.
The class BaseJsonWorkflowEngine
is a partial implementation that can store
crash recovery information in a JSON object of the implementer's choosing,
making recovery easier.
Unload filters implement ca.on.oicr.gsi.vidarr.UnloadFilterProvider
and
ca.on.oicr.gsi.vidarr.UnloadFilter
. These plugins allow customisable
selection of workflows for unloading. For an understanding of unloading
filters, see Loading and Unloading.
The user will specify a filter as a JSON object with a "type"
property. The
classes implementing UnloadFilter
will be deserialized by Jackson. The
UnloadFilterProvider
associates the strings used in "type"
to the objects.
One provider can provide multiple filter types. The types used should be
plugin-
filter; names without dashes and names starting with vidarr-
are
reserved by Víðarr. The filter can have dashes in it if desired. If two
plugins provide duplicate type names, Víðarr will fail to load.
Once a filter is deserialised, it needs to convert the request into a query
Víðarr can apply to its database. That is, it needs to be converted to a query
made of only workflow, workflow run, and external key matches. The server will
call convert
with an UnloadFilter.Visitor
so that the filter can determine
whatever information it needs and generate an output query.
For example, if external keys are connected to Pinery, then a filter might want to filter on runs. A filter could query Pinery and get all the external identifiers associated with that run and then construct a query based on those to match workflow runs that use any of those identifiers.
The priority consumable resource takes plugins for the inputs, formulas, and
scorers. These are used by the Priority Consumable
Resource. This follows the same pattern as the
other plugins: an implementation of ca.on.oicr.gsi.vidarr.PriorityInput
,
ca.on.oicr.gsi.vidarr.PriorityFormula
, or
ca.on.oicr.gsi.vidarr.PriorityScorer
for the input, formula, and scorers,
respectively and there needs to be a corresponding implementation of
PriorityFormulaProvider
, PriorityInputProvider
, or
PriorityScorerProvider
.
In the priority consumable resource's configuration, the "type"
property will
select the appropriate input, formula, or scorer and deserialize it as a JSON
object.
Each component will be called for every pending workflow run, so the analysis
should be relatively fast. PriorityInput
implementations should cache results
from external services.
This core implementation provides several plugins independent of external systems.
Consumable resources provided in Víðarr core.
Allows overriding a consumable resource to permit workflow runs to run even if they would hit a limit.
The manual override wraps another consumable resource to allow by-passing its
logic. The "inner"
property is the configuration for the consumable resource
to wrap. It maintains an allow-list of workflow run IDs that can run even if
the resource would deny them access. The list of allowed IDs is lost on server
shutdown.
{
"inner": { "type": ...},
"type": "manual-override"
}
All the configuration parameters for the inner consumable resource are
unmodified, so this is not a workflow-run visible change. To add or remove
workflow run IDs to the allow list, send an HTTP POST
or DELETE
request to
/consumable-resource/
name/allowed/
run where name is the consumable
resource name and run is the workflow run ID. The current list can be
retrieved by making a GET
request to
/consumable-resource/
name/allowed
.
As an example, suppose you wish to have a max-in-flight, but want to run something urgent. The configuration would look like:
"global-max": {
"inner": {
"type": "max-in-flight",
"maximum": 500
},
"type": "manual-override"
}
And when that urgent deadline happens for a special workflow run:
curl -X POST http://vidarr.example.com/api/consumable-resource/global-max/allowed/cbc8ad81b733696d645b42cc08760f4e7c70228a971f4ff2ec1eb0952f18e682
Set a global maximum number of workflow runs that can be simultaneously active.
{
"maximum": 500,
"type": "max-in-flight"
}
The priority consumable resource operates by computing a number, a priority, for each workflow run and then allowing the workflow run to proceed based on that number.
The resource first takes data from the submission request and then
implementations of PriorityInput
consume this data and produce a numeric
value. Those values are then consumed by PriorityFormula
to produce a final
definitive score from all the numbers. If a default priority is provided, the
submission request can contain no information and the inputs and formula will
be skipped and the default priority will be used instead.
The priority is then scored by a PriorityScorer
which determines if the
workflow is allowed to run or not.
See the other sections for the possible inputs, formulas, and scorers.
{
"type": "priority",
"defaultPriority": null,
"inputs": {
"foo": ...,
"bar": ...
},
"formula": ...,
"scorer": ...
}
Input provisioners provided in Víðarr core.
Allows selecting multiple different input provisioners depending on a "type"
provided in the metadata.
{
"type": "oneOf",
"provisioners": {
"name1": {...},
"name2": {...},
}
}
Allows input to be provided as a string that is assumed to be a path.
{
"type": "raw",
"format": [ "FILE", "DIRECTORY" ]
}
This can be limited to a particular input type format.
Output provisioners provided in Víðarr core.
Allows selecting multiple different output provisioners depending on a "type"
provided in the metadata.
{
"type": "oneOf",
"provisioners": {
"name1": {...},
"name2": {...},
}
}
Priority inputs provided in Víðarr core.
Takes input as an index into an array and returns the value in that array. If
the index is less than zero, "underflowPriority"
is returned. If the index is
beyond the end of the array, "overflowPriority"
is used. The priorities are
stored in "file"
which must be a JSON file containing an array of integers.
{
"type": "json-array",
"file": "/path/to/list.json"
"overflowPriority": 0,
"underflowPriority": 1000
}
Takes input as a string and looks up the value of that in a dictionary. If
the input is not in the dictionary, "defaultPriority"
is used. The priorities
are stored in "file"
which must be a JSON object where all the values are
integers.
{
"type": "json-dictionary",
"defaultPriority": 0,
"file": "/path/to/obj.json"
}
Allows the submitter to select one of multiple priority inputs using a tagged union.
{
"type": "oneOf",
"defaultPriority": 0,
"inputs": {
"FOO": {...},
"BAR": {...}
}
}
The input will take a tagged union/algebraic data type with the appropriate
inputs. If the name provided by the submitter does not match one of the inputs,
"defaultPriority"
is used instead. The names of the keys of "inputs"
should
be capitalized for compatibility with Shesmu.
Reads a variable from Prometheus, filtering on the label set, and returns the current value.
{
"type": "prometheus",
"cacheRequestTimeout": 1,
"cacheTtl": 15,
"defaultPriority": 0,
"labels": ["bob"],
"query": "some_prometheus_variable",
"url": "http://prometheus.example.com:9090",
"workflowNameLabel": "workflow",
"workflowVersionLabel": null
}
The process this input provider uses is as follows:
- Execute
"query"
on the Prometheus instance at"url"
. The query can be any valid Prometheus query. If it takes longer than"cacheRequestTimeout"
minutes, then the query will be treated as a failure. The results will be cached for"cacheTtl"
minutes before being refreshed. - The submission request will be processed into a label set as described below.
- All the records that were returned by the query are scanned for a matching label set.
- If a matching label set is found, the last recorded value will be used, regardless of when Prometheus observed it.
- If no matching label set is found,
"defaultPriority"
will be used.
The label set is constructed from the submission request. For each string in
"labels"
, the submitter must provide a string value. These labels and values
will be used as the label set. For example, with the configuration "labels": ["bob"]
, the submission request could have {"bob": "eggs"}
and the filtered
label set would look like [bob=eggs]
. Additionally, special labels are
available for the workflow name and version. If "workflowNameLabel": "workflow"
and the submission request was for bcl2fastq
, then the label set
would be [workflow=bcl2fastq]
. This can be further refined with a workflow
version using "workflowVersionLabel"
, which will only be used if
"workflowNameLabel"
is not null. Both of these can be turned off by being set
to null.
Takes an optional integer from the submission request and returns it raw, or
"defaultPriority"
if not provided.
{
"type": "raw",
"defaultPriority": 0
}
Takes an arbitrary JSON value and sends it to remote HTTP endpoint for
evaluation. That endpoint must return a single number. The result will be
cached. The "schema"
is a standard Víðarr type that should be requested from
the submission request.
{
"type": "remote",
"defaultPriority": 0,
"schema": "string",
"ttl": 15,
"url": "http://foo.com/api/get-priority"
}
The "schema"
property defines a type, including an object types, that will be
required on submission. The data provided by the submission will be sent via
POST
request as the body to the URL provided. The endpoint must respond with
an integer for the priority or null to use the default priority. The result
will be cached for "ttl"
minutes before being reattempted.
This changes the type of an input provider for compatibility with Shesmu. The crux is this: Shesmu's tagged unions are more limited than Víðarr's. Shesmu requires that a tagged union have a tuple or object while Víðarr permits either of those. When using the one-of input source, this introduces the possibility of creating a type that Shesmu cannot process. This allows wrapping a priority input's type in a single element tuple, thereby making it compatible with Shesmu.
{
"type": "tuple",
"inner": {...}
}
Priority formulas provided in Víðarr core.
Returns a constant value.
{
"type": "constant",
"value": 100
}
Accesses one of the input scores. If no input score has the identifier
"name"
, the minimum integer value is used.
{
"type": "input",
"name: "foo"
}
Takes the minimum or maximum of other formulas.
{
"type": "maximum",
"components": [ ... ]
}
or
{
"type": "minimum",
"components": [ ... ]
}
Computes the product of other formulas (i.e., multiplies their scores).
{
"type": "product",
"components": [ ... ]
}
Computes the difference between two formulas; the result of "left"
minus the
result of "right"
.
{
"type": "difference",
"left": ...,
"right": ...
}
Computes the summation of other formulas (i.e., adds their scores).
{
"type": "sum",
"components": [ ... ]
}
Increases the priority as a workflow run sits around. The duration the workflow
run has been waiting is looked up in the "escalation"
object; the keys are an
ISO-8601 duration and the
values are a floating point number. The smallest matching duration is used and
the score is multiplied by the value provided. Values need to be greater than 1
to increase priority. If workflow run has been waiting less than the smallest
duration in the dictionary, the original priority is used. The original
priority is provided using the "base"
formula.
{
"type": "escalating-multiplier",
"base": ...,
"escalation": {
"PT1H": 1.2,
"PT12H": 2.0
}
}
Increases the priority as a workflow run sits around. The duration the workflow
run has been waiting is looked up in the "escalation"
object; the keys are an
ISO-8601 duration and the
values are an integer. The smallest matching duration is used and the value
provided is added to the original score. Values need to be greater than 1 to
increase priority. If workflow run has been waiting less than the smallest
duration in the dictionary, the original priority is used. The original
priority is provided using the "base"
formula.
{
"type": "escalating-offset",
"base": ...,
"escalation": {
"PT1H": 10,
"PT12H": 100
}
}
Priority scorers provided in Víðarr core.
Checks several priority scorers and allows permits the workflow run to proceed if all scorers allow it to proceed.
{
"type": "all",
"scorers": [ ... ]
}
This can be combined with the ranked max-in-flight family to allow a global limit with per-workflow limits. For example:
{
"scorers": [
{
"maxInFlight": 500,
"type": "ranked-max-in-flight"
},
{
"maxInFlight": 20,
"useCustom": true,
"type": "ranked-max-in-flight-by-workflow"
}
],
"type": "all"
}
This would let the top 500 workflow runs to execute as long as they are also among the top 20 workflow run in their respective workflow type.
Checks several priority scorers and allows permits the workflow run to proceed if any scorer would allow it to proceed.
{
"type": "any",
"scorers": [ ... ]
}
Allows the workflow run to start if the score is strictly greater than "cutoff"
.
{
"type": "cutoff",
"cutoff": 9000
}
Ranks workflow runs by score and allows the top ones to run, where the number
allowed to run is "maxInFlight"
. This workflow makes a best effort to keep
the total number running at or below that limit, but various conditions,
including server relaunch or being used in an "any"
scorer, may cause it to
exceed that bound.
This scorer comes in a few flavours:
"ranked-max-in-flight"
: the limit is applied to all workflow runs"ranked-max-in-flight-by-workflow"
: the limit is applied per workflow type"ranked-max-in-flight-by-workflow-version"
: the limit is applied per workflow type including version
The limit cannot be set individually per workflow in this configuration.
However, "ranked-max-in-flight-by-workflow"
and
"ranked-max-in-flight-by-workflow-version"
have an additional property
"useCustom"
, which will use the max-in-flight values set when a workflow is
created, as is visible through the /api/max-in-flight
endpoint. In that case
"maxInFlight"
is treated as a fallback.
{
"type": "ranked-max-in-flight",
"maxInFlight": 500
}
or
{
"type": "ranked-max-in-flight-by-workflow",
"useCustom": true,
"maxInFlight": 50
}
or
{
"type": "ranked-max-in-flight-by-workflow",
"useCustom": false,
"maxInFlight": 50
}