-
Notifications
You must be signed in to change notification settings - Fork 51
Unified modules and their roles
Checkor module checks the workflows in completed
status in ReqMgr2.
-
completed
toclosed-out
transition:Calculate the expected and observed statistics for the outputs of the workflow in terms of lumisections. If the observed statistics are greater than or equal to the threshold (
fractionpass
), then the workflow is moved toclosed-out
status in ReqMgr2. This means that the workflow produced the satisfactory results for all the outputs. If the observed statistics are not satisfactory, then the workflow is labeled withassistance
tag, meaning that the workflow requires manual intervention to tackle the issues that it had. Thefractionpass
is 100% by default, but it can be overwritten in the campaign level. For instance, most MC workflows have 95%fractionpass
. There is also some extra logic in the module which might lower thefractionpass
if a certain criteria is met. -
Assistance labeling
As mentioned above, if the workflow did not reach to the satisfactory results, then it stays in
completed
status and it's labeled with severalassistance
tags. These tags show which kind of issue the workflow has and in which level of resubmission (ACDC) it is. -
Output lumisection size check:
Both too small and too big lumisections are problematic. This module checks for both too small and big lumisections. The lower limit is determined in Unified Configuration file. If the events/lumi of an output is lower than this value, then workflow is tagged with
assistance-smalllumi
label and a human checks the workflow.The upper limit is determined in the campaign level. If it
lumi_size
is-1
, then this means that there is no limit. If the events/lumi is greater than the upper limit, then the workflow is tagged withassistance-biglumi
and a human checks the workflow. -
Filemismatch check:
For each output dataset, the module checks if the number of files in DBS matches with that of Rucio. If it does not match, then the workflow is tagged with
assistance-filemismatch
label and a human checks the workflow.Note that there is a delay between file injection to DBS and Rucio in WMAgent, which causes a filemismatch temporarily. In this scenario, the workflow is tagged with
assistance-agentfilemismatch
label and if the filemismatch is not resolved within 2 days, then the workflow is moved toassistance-filemismatch
-
[CURRENTLY DISABLED] Duplicate check:
For each output, the module queries DBS and checks for duplicate events. In case of duplicate events, it invalidates the file(s) which is/are causing the duplicate.
Since this is a very expensive and heavy operation, this feature is currently disabled.
-
Invalid file(s) check:
If the number of invalid files in the output is above a threshold, then the workflow is tagged with
assistance-invalidfiles
and a human checks the workflow. -
Create/Update JIRA ticket
Based on the checks done within the module, a JIRA ticket is created/updated automatically.
-
Create a lumisection summary webpage:
A webpage is created which shows the lumisections: E.g. https://cms-unified.web.cern.ch/cms-unified/datalumi/lumi.ReReco-Run2017C-JetHT-UL2017_MiniAODv1_NanoAODv2_pilot4-00001.html
-
Create notifications for the requestors:
Create a notification for the requestors about the issues that the workflow is having.