-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace single curl instance with dynamic pool to allow setting curl options independently from request execution #610
base: master
Are you sure you want to change the base?
Conversation
…options independently from request execution
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
@yuyiguo just to give you a context. The changes Valentin is proposing here are meant to fix a curl problem that we hit every day in production. Error reported here: dmwm/WMCore#9221 Thanks, Valentin! |
Given the high impact of this change, I'd recommend against pushing it now and having only a couple of days to validate it. |
I need to do the code review and testing before cmsweb-testbed. It would be really hard to make it since production is next Tuesday. Will try, but not guarantee. |
@yuyiguo what I meant is that you should skip it from this cycle and wait for the July upgrade. It's too dangerous to insert such changes at this stage. |
@amaltaro |
BTW, this will of course happen more frequently now that we have DBS running with twice more threads. Which I can confirm by scanning one of the agents logs, it happens 10s of times per day. |
@amaltaro |
Yuyi, DBS is used in a few different places in central production and most of them are just sequential, so a single thread handling DBS requests. Problem reported is actually happening on the sequential application (WorkQueueManager); where a DBSReader (and DbsApi) instance is created: and then reused through all the datasets to be handled (in different http requests though). I don't see where the cache gets deleted as well, so the same DBS object is used among different component cycles. |
Alan, |
Yuyi, I haven't looked at the internals of pycurl. What I understood is that each http request would use a different (idle) connection, such that the same object isn't shared among different client requests. If that's correct, then I assume the connection pool should have at least the same length of the number of cherrypy threads. About reproducing this error, I unfortunately don't have any recipe. It happens a few 10s of times per day. Grepping the global workqueue logs, you can see how many times it happened over the last 3 days:
|
Alan, This is how WorkQueueManager make DBS connection. Could you pointed me where the code invoke setop? |
I think you're mixing different pool concepts. One pool can be on server side and another on client side. My changes create the later, i.e. the client who will get |
Yuyi,
I added curl pool size as an external parameter and I changed curl retrieval
from the pool to be random.
Valentin.
…On 0, Yuyi Guo ***@***.***> wrote:
@vkuznet
Could you add pool size ? And address the question: why connection pool could fix the [problem](dmwm/WMCore#9221) ? The error was from the sequential application (WorkQueueManager).
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#610 (comment)
|
Yuyi,
I think the error message is clear:
ERROR:CherryPyPeriodicTask:Periodic Thread ERROR pycurl.error cannot invoke setopt() - perform() is currently running
It means that in existing code when dbs client creates a new pycurl object there
is another one in another thread. The first object tries to apply setopt, while
object in another thread execute perform(). Since DBS code creates all the time
pycurl object the pool will ensure that this will not happen, i.e. object
will be created up to the pool size and then reused, meaning that there will be
no need to call setopt() again on them.
V.
…On 0, Yuyi Guo ***@***.***> wrote:
@vkuznet
Valentin,
Thanks for the code updating. I still did not get how this will solve the [problem](dmwm/WMCore#9221)?
Yuyi
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#610 (comment)
|
@vkuznet Yuyi |
Yuyi, the code you point is telling nothing about usage of dbsClient, while Alan explicitly said quoting: |
The traceback is avaialble in the issue opened against WMCore, for full information. And here is the place where the actual DBS request is made (a simple serverinfo call to evaluate which DBS instance we're talking to): For completeness of what I said before, these DBS calls are made in a sequential mode, but I figured that both local and global workqueue are multithreaded. My original report was based on the global workqueue, which runs mostly 2 cherrypy threads performing distinct operations, e.g.: but diving in the code, they seem to use the same DBSReader and DbsApi object, using this utilitarian function that does caching: If I got all this right, making sure that each thread has its own DBSReader/DbsApi instance would solve this problem. Did I miss anything? |
I think each thread has its own DBSApi object should solve the problem. Could you please test in one of the workqueue? |
If app is multithreaded then it explain the issue. And for that I should tweak
further the DBS RestApi to make it thread safe. So far it is not since
RestApi invokes HTTPRequest and this is where things may go through race
conditions in threads, see this function
https://github.com/dmwm/DBS/blob/master/PycurlClient/src/python/RestClient/RequestHandling/HTTPRequest.py#L49
The __call__ setup new headers for every request, that means that if we use
the same object in different threads we may end-up in situation where one
thread will call perform, while another will invoke setop for passed
curl_object.
So, in RestApi I should introduce pool which will be thread safe, e.g.
it will be a dict which hold one curl object per thread.
Then, in each thread this object will be only called sequentially.
I can do changes next week.
…On 0, Alan Malta Rodrigues ***@***.***> wrote:
The traceback is avaialble in the issue opened against WMCore, for full information.
And here is the place where the actual DBS request is made (a simple serverinfo call to evaluate which DBS instance we're talking to):
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WorkQueue/DataLocationMapper.py#L38
For completeness of what I said before, these DBS calls are made in a sequential mode, but I figured that both local and global workqueue are multithreaded. My original report was based on the global workqueue, which runs mostly 2 cherrypy threads performing distinct operations, e.g.:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/GlobalWorkQueue/CherryPyThreads/ReqMgrInteractionTask.py#L28
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/GlobalWorkQueue/CherryPyThreads/LocationUpdateTask.py#L23
but diving in the code, they seem to use the same DBSReader and DbsApi object, using this utilitarian function that does caching:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WorkQueue/WorkQueueUtils.py#L33
If I got all this right, making sure that each thread has its own DBSReader/DbsApi instance would solve this problem. Did I miss anything?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#610 (comment)
|
I agree with Valentin. Having DbsApi thread-safe will be important for future larger activities. |
Sounds a good plan, Valentin and Alan. |
Valentin, |
Yuyi, So, it is not about requester, it is about instances of the class. A single requestor may create multiple instance of |
Thanks Valentin ! |
Yuyi, I made it configurable already, please see
the RestApi contains curl_pool_size parameter which others can use to setup whatever they feel necessary for their use case. Do you want to change the default?
|
I put 10 as default value of |
Thanks, Valentin ! |
@vkuznet Q2: Thanks, |
Yuyi, If you want to make The same applies to the The point is that a single And |
@vkuznet I understood your explanation on the thread safe issue, but how this PR will solve @amaltaro 's original problem. He was running multiple threads and inside a thread the operation was sequential. So 1. thread safe is required. 2. pool is nice to have, but it will not help him since the operation is sequential. Am I right? If curl_pool_size is none, may the unlimited pool cause memory leaks on the client side? I think if curl_pool_size is none, we make it no pool just one curl object or pool size 1. Do you think what I suggested make sense? If so, could you please update the PR? |
Yuyi, As I wrote you should discuss with Alan if thread-safe code is required. This is up to application which uses Regarding pool, I provided sensitive default 10. If you want to change code when no pool size is provided then you should modify code accordingly, I don't have time for that anymore. |
No description provided.