You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
While testing the SiteLists changes propagation down to the local workqueue elements at the WMAgent togather with @mapellidario we found out that there was no change triggered at the local Workqeue by a workflow parameters update in WMSatats and the respective global Workqueue Elements update. In our initial investigation we found out few reasons for this:
Few typos in the workflow parameters names
A too narrowed mask for Local Workqueue Elements statuses to be considered for update
A broken mechanism for fetching the CouchDB url for the workflow spec
The workload object for the local workqueue was loaded from the spec instead of localcouchdb, which
The workload.specUrl() method was not agnostic to the source from where the workfload object have been created, which was resutling in an exception of the type: [1], while trying to preserve the changes in local couch
Even upon fixing the way how we set instantiate the workload object and calling the saveCouch method properly we were still facing an Unauthorised error because the spec url returned by the method above was sanitized during the LocalWorkQueue object creation and the username and password removed from the url
We found few redundant operations for calling the workload setters methods upon updating the workqueue elments, which is already done sequentially through the procedures of updating the workqueue elements as widely discussed in the issue and the implementation for the GlobalWorkQueu elements update:
How to reproduce it
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Additional context and error message
This issues were found while validating the sitewhitelist/siteblacklist dynamic change in view of the upcoming central services and wmagent release candidates #12222#12224 . it is a followup to the PR #12123
[1]
In [1]: sitelistpoller.algorithm()
2025-01-28 12:50:59,981:INFO:SiteListPoller:algorithm(): Active workflows: dict_keys(['dmapelli_SC_EL8_JSON_Nvidia_test_v1_250124_095025_2857', 'dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803', 'dmapelli_TaskChain_ProdMinBias_Nvidia_test_v1_250124_095038_1012'])
2025-01-28 12:50:59,982:INFO:SiteListPoller:wmstatsDict(): Fetch site info from WMStats for condition: {'RequestStatus': 'running-closed'} and mask ['SiteWhitelist', 'SiteBlacklist']
2025-01-28 12:51:00,075:INFO:SiteListPoller:wmstatsDict(): Fetch site info from WMStats for condition: {'RequestStatus': 'running-open'} and mask ['SiteWhitelist', 'SiteBlacklist']
2025-01-28 12:51:00,162:INFO:SiteListPoller:wmstatsDict(): Fetch site info from WMStats for condition: {'RequestStatus': 'acquired'} and mask ['SiteWhitelist', 'SiteBlacklist']
2025-01-28 12:51:00,248:INFO:SiteListPoller:algorithm():
wdict: {'dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803': {'SiteBlacklist': ['T2_AT_Vienna',
'T2_BE_IIHE'],
'SiteWhitelist': ['T1_US_FNAL',
'T2_CH_CERN']},
'dmapelli_SC_EL8_JSON_Nvidia_test_v1_250124_095025_2857': {'SiteBlacklist': [],
'SiteWhitelist': ['T1_US_FNAL',
'T2_CH_CERN']},
'dmapelli_TaskChain_ProdMinBias_Nvidia_test_v1_250124_095038_1012': {'SiteBlacklist': [],
'SiteWhitelist': ['T1_US_FNAL',
'T2_CH_CERN']}}
2025-01-28 12:51:00,286:INFO:SiteListPoller:algorithm(): Updating dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803:
2025-01-28 12:51:00,286:INFO:SiteListPoller:algorithm(): siteWhitelist ['T1_US_FNAL', 'T2_CH_CERN'] => ['T1_US_FNAL', 'T2_CH_CERN']
2025-01-28 12:51:00,286:INFO:SiteListPoller:algorithm(): siteBlacklist [] => ['T2_AT_Vienna', 'T2_BE_IIHE']
2025-01-28 12:51:00,295:ERROR:SiteListPoller:algorithm(): Unexpected exception while updating elements in local workqueue Details:
You must include http(s):// in your servers address
Traceback (most recent call last):
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMComponent/WorkflowUpdater/SiteListPoller.py", line 133, in algorithm
self.localWQ.updateElementsByWorkflow(wHelper, params, status=['Available'])
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Services/WorkQueue/WorkQueue.py", line 290, in updateElementsByWorkflow
workload.saveCouchUrl(workload.specUrl())
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/WMSpec/Persistency.py", line 124, in saveCouchUrl
return self.saveCouch(couchUrl, dbname)
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/WMSpec/Persistency.py", line 84, in saveCouch
server = CouchServer(couchUrl)
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Database/CMSCouch.py", line 967, in __init__
check_server_url(dburl)
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Database/CMSCouch.py", line 46, in check_server_url
raise ValueError('You must include http(s):// in your servers address')
ValueError: You must include http(s):// in your servers address
Out[1]: (0.7039, None, 'algorithm')
[2]
In [1]: sitelistpoller.algorithm()
2025-01-28 17:04:16,958:INFO:SiteListPoller:algorithm(): Active workflows: dict_keys(['dmapelli_SC_EL8_JSON_Nvidia_test_v1_250124_095025_2857', 'dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803', 'dmapelli_TaskChain_ProdMinBias_Nvidia_test_v1_250124_095038_1012'])
2025-01-28 17:04:16,958:INFO:SiteListPoller:wmstatsDict(): Fetch site info from WMStats for condition: {'RequestStatus': 'running-closed'} and mask ['SiteWhitelist', 'SiteBlacklist']
2025-01-28 17:04:17,054:INFO:SiteListPoller:wmstatsDict(): Fetch site info from WMStats for condition: {'RequestStatus': 'running-open'} and mask ['SiteWhitelist', 'SiteBlacklist']
2025-01-28 17:04:17,142:INFO:SiteListPoller:wmstatsDict(): Fetch site info from WMStats for condition: {'RequestStatus': 'acquired'} and mask ['SiteWhitelist', 'SiteBlacklist']
2025-01-28 17:04:17,229:INFO:SiteListPoller:algorithm():
wdict: {'dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803': {'SiteBlacklist': ['T1_IT_CNAF',
'T1_RU_JINR',
'T1_UK_RAL'],
'SiteWhitelist': ['T1_DE_KIT']},
'dmapelli_SC_EL8_JSON_Nvidia_test_v1_250124_095025_2857': {'SiteBlacklist': [],
'SiteWhitelist': ['T1_US_FNAL',
'T2_CH_CERN']},
'dmapelli_TaskChain_ProdMinBias_Nvidia_test_v1_250124_095038_1012': {'SiteBlacklist': [],
'SiteWhitelist': ['T1_US_FNAL',
'T2_CH_CERN']}}
2025-01-28 17:04:17,270:INFO:SiteListPoller:algorithm(): Updating dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803:
2025-01-28 17:04:17,270:INFO:SiteListPoller:algorithm(): siteWhitelist ['T1_US_FNAL', 'T2_CH_CERN'] => ['T1_DE_KIT']
2025-01-28 17:04:17,270:INFO:SiteListPoller:algorithm(): siteBlacklist ['T2_BE_IIHE', 'T0_CH_CSCS_HPC', 'T2_AT_Vienna'] => ['T1_IT_CNAF', 'T1_RU_JINR', 'T1_UK_RAL']
2025-01-28 17:04:17,282:ERROR:SiteListPoller:algorithm(): Unexpected exception while updating elements in local workqueue Details:
Error type: CouchUnauthorisedError, Status code: 401, Reason: Unauthorized, Data: {}
Traceback (most recent call last):
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Database/CMSCouch.py", line 133, in makeRequest
result, status, reason, cached = JSONRequests.makeRequest(
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Services/Requests.py", line 185, in makeRequest
result, response = self.makeRequest_pycurl(uri, data, verb, headers)
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Services/Requests.py", line 202, in makeRequest_pycurl
response, result = self.reqmgr.request(uri, data, headers, verb=verb,
File "/data/WMAgent.venv3/srv/WMCore/src/python/Utils/PortForward.py", line 68, in portMangle
return callFunc(callObj, url, *args, **kwargs)
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Services/pycurl_manager.py", line 353, in request
raise exc
http.client.HTTPException: url=http://127.0.0.1:5984/_all_dbs, code=401, reason=Unauthorized, headers={'Cache-Control': 'must-revalidate', 'Content-Length': '64', 'Content-Type': 'application/json', 'Date': 'Tue, 28 Jan 2025 16:04:17 GMT', 'Server': 'CouchDB/3.2.2 (Erlang OTP/23)', 'X-Couch-Request-ID': '8c9c8e9307', 'X-CouchDB-Body-Time': '0'}, result=b'{"error":"unauthorized","reason":"You are not a server admin."}\n'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMComponent/WorkflowUpdater/SiteListPoller.py", line 137, in algorithm
self.localWQ.updateElementsByWorkflow(wHelper, params, status=['Available'])
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Services/WorkQueue/WorkQueue.py", line 291, in updateElementsByWorkflow
workload.saveCouchUrl(workload.specUrl())
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/WMSpec/Persistency.py", line 124, in saveCouchUrl
return self.saveCouch(couchUrl, dbname)
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/WMSpec/Persistency.py", line 85, in saveCouch
database = server.connectDatabase(couchDBName)
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Database/CMSCouch.py", line 1013, in connectDatabase
if create and dbname not in self.listDatabases():
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Database/CMSCouch.py", line 982, in listDatabases
return self.get('/_all_dbs')
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Services/Requests.py", line 146, in get
return self.makeRequest(uri, data, 'GET', incoming_headers,
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Database/CMSCouch.py", line 137, in makeRequest
self.checkForCouchError(getattr(e, "status", None),
File "/data/WMAgent.venv3/srv/WMCore/src/python/WMCore/Database/CMSCouch.py", line 153, in checkForCouchError
raise CouchUnauthorisedError(reason, data, result, status)
WMCore.Database.CMSCouch.CouchUnauthorisedError: Error type: CouchUnauthorisedError, Status code: 401, Reason: Unauthorized, Data: {}
Out[1]: (0.6922, None, 'algorithm')
The text was updated successfully, but these errors were encountered:
Impact of the bug
WMagent
Describe the bug
While testing the SiteLists changes propagation down to the local workqueue elements at the WMAgent togather with @mapellidario we found out that there was no change triggered at the local Workqeue by a workflow parameters update in WMSatats and the respective global Workqueue Elements update. In our initial investigation we found out few reasons for this:
workload.specUrl()
method was not agnostic to the source from where the workfload object have been created, which was resutling in an exception of the type: [1], while trying to preserve the changes in local couchworkload
object and calling thesaveCouch
method properly we were still facing anUnauthorised
error because the spec url returned by the method above was sanitized during theLocalWorkQueue
object creation and the username and password removed from the urlHow to reproduce it
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Additional context and error message
This issues were found while validating the sitewhitelist/siteblacklist dynamic change in view of the upcoming central services and wmagent release candidates #12222 #12224 . it is a followup to the PR #12123
[1]
[2]
The text was updated successfully, but these errors were encountered: