- analysis output - the data generated by a workflow run. Víðarr understands two kinds of output: files and URLs.
- external identifier - a reference to external (LIMS) data. It has a provider, the name of the data store that is the source for an external identifier, and an id which is the store-specific name for this data. The same ID can be re-used across multiple providers.
- external key - a reference to a specific version of external (LIMS) data. This has the same data as an external identifier plus a dictionary of versions. Each version has a version key and a version value.
- label (analysis output) - additional information attached by the workflow run. This can serve as metadata for consumers of Víðarr output. For instance, it could be used to differentiate between read 1 and read 2 FASTQs or indicate the number of records in a file.
- label (workflow) - additional parameters used to differentiate workflows during matching. For instance, the same input FASTQ might be aligned to two different genome references, so a label could be used to distinguish these cases.
- metadata - information provided during launch of a workflow run to determine how the output should be treated. Specifically, it determines which external identifiers/keys are associated with which outputs.
- parameters - the input information for a workflow run.
- provider - An external data store (usually a LIMS) that holds identifiers that can be associated with workflow runs and analysis output.
- provisioner, input - a plugin that can use Víðarr paths or user-provided information to put a real file into a location usable by a workflow engine.
- provisioner, output - a plugin that can take a file created by a workflow run and put it into permanent storage and provide a path, size, and checksum.
- provisioner, runtime - a plugin that can take an identifier from a workflow engine and extract any information about the workflow itself (e.g., runtime performance, logs) and put them into permanent storage.
- target - configuration of provisioners and a workflow engine that is capable of executing workflow runs.
- workflow - a kind of analysis procedure. Each workflow can have many versions that can be executed. A workflow defines labels that must be included in order to execute any version. All versions of a workflow are considered equivalent for the purposes of matching (i.e., if a new version is available, but a successful workflow run from previous version completed, then it should not be executed again). See the OOP analogy below.
- workflow engine - an external system that can launch and monitor jobs (e.g., Cromwell, Nextflow).
- workflow run - a particular execution of a workflow version. See the OOP analogy below.
- workflow version - a script that can be executed by a workflow engine that instructs the workflow engine to run appropriate software based on the parameters in a workflow run. For instance, there might be a workflow for BWAmem alignment with multiple versions that encompass both the version of BWAmem and a WDL script used to launch it. See the OOP analogy below.
The relationship between workflows, workflow versions, and workflow runs can be a bit confusing. They can be analogised to object-oriented programming in the following way:
- A workflow is analogous to an interface. It defines a set of objects that are interchangeable. The contents of that interface are the labels.
- A workflow version is a concrete class that implements an interface. Just like a class, it may implement multiple interfaces (e.g., there might be an QC WDL script that can run on genome and transcriptome data, but it is registered under two different workflow names because there is a reason to keep QCs logically separate by type). Also like a class, the "constructor" for this object (i.e., the parameters and metadata information) is specific to the workflow version.
- A workflow run is a particular instance, in the object oriented sense, of a workflow version. It is a workflow version instance that has been constructed with a particular set of parameters.