-
Notifications
You must be signed in to change notification settings - Fork 40
INSTANCES: versions, REST hosts, clusters, DataBases
- CRABClient and TaskWorker will allow to indicate the REST endpoint to use to access CRAB data base one of either 1. or 2. below:
- one of a small set of predefined nicknames :
nickname | REST host | DB instance |
---|---|---|
prod | cmsweb.cern.ch | prod |
preprod | cmsweb-testbed.cern.ch | preprod |
k8s | cmsweb-k8s-testbed.cern.ch | preprod |
test | cmsweb-test2.cern.ch | dev |
dev | cmsweb-test.cern.ch | dev |
- one pair of strings indicating the REST host fqdn and the DB instance
[prod|preprod|dev|private]
CRAB CLIENT
- NOTE need keep compatibility with current situation where CRABClient users do not need to specify the CRAB service instance, in which case it defaults to "prod"
-
THEREFORE in CRABClient the restHost/dbInstance mode is activate via setting
config.General.instance = 'other'
- in which case the configuration file must contain:
-
config.General.restHost = 'stefanovm.cern.ch
# or similar fqdn config.General.dbInstance = one of [prod|preprod|dev]
-
TASK WORKER
Here we can safely remove the mode
configuation key and simply
- TaskWorker : REST host and DB instance will be indicated in two separate lines in configuration file. E.g.:
config.TaskWorker.resthost = 'stefanovm2.cern.ch'
-
config.TaskWorker.dbinstance = 'dev'
Or we can keep same as in CRABClient with the same set o pre-defined modes, includingother
- CRABServer service is a REST interface to CRAB Oracle Data Base
- CRABServer runs inside CMSWEB framework, so it is part of a given CMSWEB cluster
- Numerous CMSWEB clusters exist
- cmsweb.cern.ch aka main production one
- cmsweb-testbed.cern.ch aka testbed
- cmsweb-k8s-testbed.cern.ch supposedly identical to cmsweb-testbed
- cmsweb-test.cern.ch K8s developemnt (Valentin's playground)
- cmsweb-test[1-6].cern.ch test (developers') clusters for application developers
- cmsweb-test2.cern.ch is reserved for CRAB usage
- private VM's like stefanovm.cern.ch or stefanovm2
- Numerous CMSWEB clusters exist
- Oracle Data Base has several instances, meaning "different data bases"
- Production on CMS Production Oracle cluster
cmsr
- Preprod on
devdb11
username:cmsweb_analysis_preprod
- Dev on
devdb11
username:cmsweb_analysis_dev
- private like Stefano's or Diego's private DB's on
devdb11
- Production on CMS Production Oracle cluster
- while Oracle DBA's usually refer to
cmsr
ordevdb11
as instances (again) - CRABClient allows to submit to a given 'CRAB instance' which means a given Data Base instance: global (i.e. production) or preprod or dev etc.
- CRAB was developed at a time when it was easy to get multipl DB instances, but almost inconceivable to have more than two cmsweb clusters (cmsweb.cern.ch and cmsweb-testbed.cern.ch) therefore
- one CRABServer REST instance is capable to connect to multiple DataBases, i.e. support multiple DB instances
- So the DataBase instance (prod/preprod/dev) could not be part of the CRABServer Rest configuration, but it was specified as something that the client (clients of the CRBServer REST are CRABClient and CRABTaskWorker) indicates in the URL (API) used. Which is constructed as
hostname
/crabserver/dbinstance
/API
- e.g. both these URL's work: https://cmsweb.cern.ch/crabserver/prod/info and https://cmsweb.cern.ch/crabserver/preprod/info
- in the initial design the view was: the CRABClient (i.e. who submits) is only interested in deciding if to submit to the production or preproduction DataBase (or some private test instance) so the CRABClient configuration file accepts the parameter
config.General.instance
and "CRAB" would figure out everything - in the migration to K8s we have multiple several cmsweb clusters, i.e. multiple REST instances which may e.g. all connect to the same DB instance and want to be able to connect explicitly to one or another such clusters in order to test specific REST instances.
- the possible DataBase instances it can connect to are specified via the file
/data/srv/current/auth/crabserver/CRABServerAuth.py
which is not part of CRAB source code in this github repository but in principle is written ad-hoc for every machine where CRABServer is installed (see https://twiki.cern.ch/twiki/bin/view/CMSPublic/CMSCrabRESTInterface#Authentication_with_CERN_Oracle ). E.g. the CRAB REST production instance in cmsweb.cern.ch uses this (passwords have been removed)
import cx_Oracle as DB
import socket
fqdn = socket.getfqdn().lower()
dbconfig = {'preprod': {'.title': 'Pre-production',
'.order': 1,
'*': {'clientid': 'cmsweb-preprod@%s' %(fqdn),
'dsn': 'devdb11',
'liveness': 'select sysdate from dual',
'password': '*****' ,
'schema': 'cmsweb_analysis_preprod',
'timeout': 300,
'trace': True,
'type': DB,
'user': 'cmsweb_analysis_preprod'}},
'prod': {'.title': 'Production',
'.order': 0,
'GET': {'clientid': 'cmsweb-prod-r@%s' %(fqdn),
'dsn': 'cmsr',
'liveness': 'select sysdate from dual',
'password': '*****',
'schema': 'cms_analysis_reqmgr_r',
'timeout': 300,
'trace': True,
'type': DB,
'user': 'cms_analysis_reqmgr_r'},
'*': {'clientid': 'cmsweb-prod-w@%s' %(fqdn),
'dsn': 'cmsr',
'liveness': 'select sysdate from dual',
'password': '******',
'schema': 'cms_analysis_reqmgr_w',
'timeout': 300,
'trace': True,
'type': DB,
'user': 'cms_analysis_reqmgr_w'}}}
since this CRABServerAuth.py
file contains passwords, they are not kept in publicly available repositories.
- the CRABServer REST API machinery detects the Data Base instance from the URL in the HTTP request and selects the appropriate Oracle connection instance.
- CRABClient configuration file accepts the parameter
config.General.instance
which can also be passed as an option in the command line, and e.g.crab submit --help
lists this option:
--instance=INSTANCE Running instance of CRAB service. Valid values are
['test1', 'test3', 'test2', 'prod', 'preprod', 'test',
'k8s'].
where it is apparent how in January we added some K8s cluster overloading the parameter "instance" to indicate a particular REST instance instead of the DB instance.
- this was justified since already
config.General.instance
was used to indicate a particular REST host in order to support submission to private developer VM's via thins likeGeneral.instance = 'stefanovm2.cern.ch'
- this require that there is always a 1:1 mapping between Data Base instance and REST Host instance, so that the CRAB Client can figure out the two (needed to build the HTTP queries) from a single parameter.
- the code which maps the
General.instance
parameter into a REST hostname and a DataBase instance is in https://github.com/dmwm/CRABClient/blob/301de634b1fe16bf11696d975133487cd0094d37/src/python/CRABClient/ClientUtilities.py#L195
As an user of CRAB DataBase each TaskWorker instance need to identify one REST host to talk to and the DB instance to use.
- there is a set of pre-defined host/instance pair in the code, each TW instance can pick one of those via the configuration parameter
config.TaskWorker.mode
in theTaskWorkerConfig.py
file. Relevant code is inMasterWorker.py
where the value of this configuration parameter is calledMODEURL
:
MODEURL = {'cmsweb-dev': {'host': 'cmsweb-dev.cern.ch', 'instance': 'dev'},
'cmsweb-test': {'host': 'cmsweb-test.cern.ch', 'instance': 'preprod'},
'cmsweb-preprod': {'host': 'cmsweb-testbed.cern.ch', 'instance': 'preprod'},
'cmsweb-prod': {'host': 'cmsweb.cern.ch', 'instance': 'prod'},
'test' :{'host': None, 'instance': 'preprod'},
'private': {'host': None, 'instance': 'dev'},
}
- if
mode
is set to'test'
or'private'
, then the host name for the REST needs to be specified in theTaskWorkerConfig.py
configuration file via the (badly named) parameterconfig.TaskWorker.resturl
e.g.:
config.TaskWorker.resturl = 'stefanovm.cern.ch'
Modify CRAB Client so that the submitter can select REST host and Data Base instance independently
- be backward compatible with pre-2020 use (it is OK to break compatibility for K8s clusters)
- do not introduce a new parameter, too much work
- one possibility: allow
General.instance
to have the syntax:resthost/dbinstance
supporting also nicknames. E.g. the following would be valid instances, with obvious meaning:
prod
preprod
test2/preprod
cmsweb-test2.cern.ch/dev
test2/dev
stefanovm.cern.ch
k8s/preprod
- formally
instance
has the formata[.b.c][/i]
wherea, b, c
andi
are alphabetic strings- if
.b.c
is not present,a
is interpreted as a nickname and matched against a table of known names which will indicate for each a REST host fqdn and a Data Base instance - if
.b.c
is present,a.b.c
is interpreted as the REST host fqdn - if
/i
is not present the database instance is the one indicated in the table, ordev
(default for unknown hosts) - if
/i
is present,i
is used as Data Base instance name
- if
Should do like for CrabClient, while taking advantage that here we have
freedome with configuration file.
keep a smaller set of nicknames (MODEURLs) where both REST host and DB instance are hardcoded.
Support also MODERUL='other'
which incorporates old test/private in which case both instance and url
must be specified :
- take this change to rename
config.TaskWorker.resturl
toconfig.TaskWorker.resthost
- introduce
config.TaskWorker.dbinstance
This should require changes to: