Skip to content

INSTANCES: versions, REST hosts, clusters, DataBases

Stefano Belforte edited this page May 2, 2020 · 12 revisions

Executive Summary

  • CRABClient and TaskWorker will allow to indicate the REST endpoint to use to access CRAB data base one of either 1. or 2. below:
  1. one of a small set of predefined nicknames :
nickname REST host DB instance
prod cmsweb.cern.ch prod
preprod cmsweb-testbed.cern.ch preprod
k8s cmsweb-k8s-testbed.cern.ch preprod
test cmsweb-test2.cern.ch dev
dev cmsweb-test.cern.ch dev
  1. one pair of strings indicating the REST host fqdn and the DB instance [prod|preprod|dev|private]

CRAB CLIENT

  • NOTE need keep compatibility with current situation where CRABClient users do not need to specify the CRAB service instance, in which case it defaults to "prod"
  • THEREFORE in CRABClient the restHost/dbInstance mode is activate via setting
    • config.General.instance = 'other'
    • in which case the configuration file must contain:
      • config.General.restHost = 'stefanovm.cern.ch # or similar fqdn
      • config.General.dbInstance = one of [prod|preprod|dev]

TASK WORKER

Here we can safely remove the mode configuation key and simply

  • TaskWorker : REST host and DB instance will be indicated in two separate lines in configuration file. E.g.:
    • config.TaskWorker.resthost = 'stefanovm2.cern.ch'
    • config.TaskWorker.dbinstance = 'dev' Or we can keep same as in CRABClient with the same set o pre-defined modes, including other

Concepts and definitions

  • CRABServer service is a REST interface to CRAB Oracle Data Base
  • CRABServer runs inside CMSWEB framework, so it is part of a given CMSWEB cluster
    • Numerous CMSWEB clusters exist
      • cmsweb.cern.ch aka main production one
      • cmsweb-testbed.cern.ch aka testbed
      • cmsweb-k8s-testbed.cern.ch supposedly identical to cmsweb-testbed
      • cmsweb-test.cern.ch K8s developemnt (Valentin's playground)
      • cmsweb-test[1-6].cern.ch test (developers') clusters for application developers
        • cmsweb-test2.cern.ch is reserved for CRAB usage
      • private VM's like stefanovm.cern.ch or stefanovm2
  • Oracle Data Base has several instances, meaning "different data bases"
    • Production on CMS Production Oracle cluster cmsr
    • Preprod on devdb11 username: cmsweb_analysis_preprod
    • Dev on devdb11 username: cmsweb_analysis_dev
    • private like Stefano's or Diego's private DB's on devdb11
  • while Oracle DBA's usually refer to cmsr or devdb11 as instances (again)
  • CRABClient allows to submit to a given 'CRAB instance' which means a given Data Base instance: global (i.e. production) or preprod or dev etc.

History and evolving requirements

  • CRAB was developed at a time when it was easy to get multipl DB instances, but almost inconceivable to have more than two cmsweb clusters (cmsweb.cern.ch and cmsweb-testbed.cern.ch) therefore
    • one CRABServer REST instance is capable to connect to multiple DataBases, i.e. support multiple DB instances
  • So the DataBase instance (prod/preprod/dev) could not be part of the CRABServer Rest configuration, but it was specified as something that the client (clients of the CRBServer REST are CRABClient and CRABTaskWorker) indicates in the URL (API) used. Which is constructed as hostname/crabserver/dbinstance/API
  • in the initial design the view was: the CRABClient (i.e. who submits) is only interested in deciding if to submit to the production or preproduction DataBase (or some private test instance) so the CRABClient configuration file accepts the parameter config.General.instance and "CRAB" would figure out everything
  • in the migration to K8s we have multiple several cmsweb clusters, i.e. multiple REST instances which may e.g. all connect to the same DB instance and want to be able to connect explicitly to one or another such clusters in order to test specific REST instances.

Implementation details and configurations as of April 2020

CRABServer

import cx_Oracle as DB
import socket
fqdn = socket.getfqdn().lower()
dbconfig = {'preprod': {'.title': 'Pre-production',
                        '.order': 1,
                        '*': {'clientid': 'cmsweb-preprod@%s' %(fqdn),
                              'dsn': 'devdb11',
                              'liveness': 'select sysdate from dual',
                              'password': '*****' ,
                              'schema': 'cmsweb_analysis_preprod',
                              'timeout': 300,
                              'trace': True,
                              'type': DB,
                              'user': 'cmsweb_analysis_preprod'}},
            'prod': {'.title': 'Production',
                     '.order': 0,
                     'GET': {'clientid': 'cmsweb-prod-r@%s' %(fqdn),
                             'dsn': 'cmsr',
                             'liveness': 'select sysdate from dual',
                             'password': '*****',
                             'schema': 'cms_analysis_reqmgr_r',
                             'timeout': 300,
                             'trace': True,
                             'type': DB,
                             'user': 'cms_analysis_reqmgr_r'},
                     '*':  {'clientid': 'cmsweb-prod-w@%s' %(fqdn),
                            'dsn': 'cmsr',
                            'liveness': 'select sysdate from dual',
                            'password': '******',
                            'schema': 'cms_analysis_reqmgr_w',
                            'timeout': 300,
                            'trace': True,
                            'type': DB,
                            'user': 'cms_analysis_reqmgr_w'}}}

since this CRABServerAuth.py file contains passwords, they are not kept in publicly available repositories.

  • the CRABServer REST API machinery detects the Data Base instance from the URL in the HTTP request and selects the appropriate Oracle connection instance.

CRABClient

  • CRABClient configuration file accepts the parameter config.General.instance which can also be passed as an option in the command line, and e.g. crab submit --help lists this option:
 --instance=INSTANCE   Running instance of CRAB service. Valid values are
                        ['test1', 'test3', 'test2', 'prod', 'preprod', 'test',
                        'k8s'].

where it is apparent how in January we added some K8s cluster overloading the parameter "instance" to indicate a particular REST instance instead of the DB instance.

  • this was justified since already config.General.instance was used to indicate a particular REST host in order to support submission to private developer VM's via thins like General.instance = 'stefanovm2.cern.ch'
  • this require that there is always a 1:1 mapping between Data Base instance and REST Host instance, so that the CRAB Client can figure out the two (needed to build the HTTP queries) from a single parameter.
  • the code which maps the General.instance parameter into a REST hostname and a DataBase instance is in https://github.com/dmwm/CRABClient/blob/301de634b1fe16bf11696d975133487cd0094d37/src/python/CRABClient/ClientUtilities.py#L195

CRABTaskWorker

As an user of CRAB DataBase each TaskWorker instance need to identify one REST host to talk to and the DB instance to use.

  • there is a set of pre-defined host/instance pair in the code, each TW instance can pick one of those via the configuration parameter config.TaskWorker.mode in the TaskWorkerConfig.py file. Relevant code is in MasterWorker.py where the value of this configuration parameter is called MODEURL:
MODEURL = {'cmsweb-dev': {'host': 'cmsweb-dev.cern.ch', 'instance':  'dev'},
           'cmsweb-test': {'host': 'cmsweb-test.cern.ch', 'instance': 'preprod'},
           'cmsweb-preprod': {'host': 'cmsweb-testbed.cern.ch', 'instance': 'preprod'},
           'cmsweb-prod': {'host': 'cmsweb.cern.ch', 'instance':  'prod'},
           'test' :{'host': None, 'instance': 'preprod'},
           'private': {'host': None, 'instance':  'dev'},
          }
  • if mode is set to 'test' or 'private', then the host name for the REST needs to be specified in the TaskWorkerConfig.py configuration file via the (badly named) parameter config.TaskWorker.resturl e.g.:
config.TaskWorker.resturl = 'stefanovm.cern.ch'

Changes to better manage current situation

1. CRAB Client

Modify CRAB Client so that the submitter can select REST host and Data Base instance independently

  • be backward compatible with pre-2020 use (it is OK to break compatibility for K8s clusters)
  • do not introduce a new parameter, too much work
  • one possibility: allow General.instance to have the syntax: resthost/dbinstance supporting also nicknames. E.g. the following would be valid instances, with obvious meaning:
prod
preprod
test2/preprod
cmsweb-test2.cern.ch/dev
test2/dev
stefanovm.cern.ch
k8s/preprod
  • formally instance has the format a[.b.c][/i] where a, b, c and i are alphabetic strings
    • if .b.c is not present, a is interpreted as a nickname and matched against a table of known names which will indicate for each a REST host fqdn and a Data Base instance
    • if .b.c is present, a.b.c is interpreted as the REST host fqdn
    • if /i is not present the database instance is the one indicated in the table, or dev (default for unknown hosts)
    • if /i is present, i is used as Data Base instance name

2. CRAB Task Worker

Should do like for CrabClient, while taking advantage that here we have freedome with configuration file. keep a smaller set of nicknames (MODEURLs) where both REST host and DB instance are hardcoded. Support also MODERUL='other' which incorporates old test/private in which case both instance and url must be specified :