Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Unhandled error in Deferred" when a child process exits unsuccessfully #142

Open
guidow opened this issue Sep 11, 2014 · 0 comments
Open

Comments

@guidow
Copy link
Contributor

guidow commented Sep 11, 2014

When an assignment fails, I get the following log output from the agent:

2014-09-11 12:43:05 DEBUG    - pf.jobtypes.process - In ProcessProtocol.processExited, reason: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessTerminated'>: A process has ended with a probable error condition: process ended with exit code 1.
]
2014-09-11 12:43:05 INFO     - pf.jobtypes.core - ProcessProtocol(uuid=68ca4657-39a0-11e4-bc55-c86000cbf5fb, pid=None) stopped (code: 1)
2014-09-11 12:43:05 ERROR    - pf.jobtypes.core - No error was defined for this failure.
2014-09-11 12:43:05 ERROR    - pf.jobtypes.core - Task {u'frame': 1.0, u'attempt': 2, u'id': 7} failed: None
2014-09-11 12:43:05 ERROR    - pf.jobtypes.core - No error was defined for this failure.
2014-09-11 12:43:05 ERROR    - pf.jobtypes.core - Task {u'frame': 2.0, u'attempt': 2, u'id': 8} failed: None
2014-09-11 12:43:05 INFO     - pf.jobtypes.log - Closed /tmp/pyfarm/agent/logs/tasks/2014-09-11_10-43-04_4_68ca465739a011e4bc55c86000cbf5fb.csv
2014-09-11 12:43:05 INFO     - pf.jobtypes.core - Uploading log file 2014-09-11_10-43-04_4_68ca465739a011e4bc55c86000cbf5fb.csv to master, URL 'http://127.0.0.1:5000/api/v1/jobs/4/tasks/7/attempts/2/logs/2014-09-11_10-43-04_4_68ca465739a011e4bc55c86000cbf5fb.csv/logfile'
2014-09-11 12:43:05 DEBUG    - twisted         - Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fbcfc08af80>
2014-09-11 12:43:05 WARNING  - pf.jobtypes.core - There was at least one failed process in the assignment RenderWithBlender(job=4, tasks=(7, 8), jobtype='Render With Blender', version=1, title='Stift Render')
2014-09-11 12:43:05 DEBUG    - pf.agent.http.assign - Assignment 68c87287-39a0-11e4-a5fd-c86000cbf5fb has stopped
2014-09-11 12:43:05 DEBUG    - twisted         - Unhandled error in Deferred:
2014-09-11 12:43:05 ERROR    - root            - Unhandled Error
Traceback (most recent call last):
Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended with exit code 1.
Traceback (most recent call last):
  File "/home/guido/git/pyfarm/pyfarm-agent/pyfarm/agent/entrypoints/main.py", line 460, in start
url, headers={"Content-Type": "application/json"})
  File "/var/tmp/virtualenv/pyfarm-agent-python2/lib/python2.7/site-packages/requests-2.3.0-py2.7.egg/requests/api.py", line 55, in get
return request('get', url, **kwargs)
  File "/var/tmp/virtualenv/pyfarm-agent-python2/lib/python2.7/site-packages/requests-2.3.0-py2.7.egg/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
  File "/var/tmp/virtualenv/pyfarm-agent-python2/lib/python2.7/site-packages/requests-2.3.0-py2.7.egg/requests/sessions.py", line 456, in request
resp = self.send(prep, **send_kwargs)                                                                                                  
  File "/var/tmp/virtualenv/pyfarm-agent-python2/lib/python2.7/site-packages/requests-2.3.0-py2.7.egg/requests/sessions.py", line 559, in send
r = adapter.send(request, **kwargs)
  File "/var/tmp/virtualenv/pyfarm-agent-python2/lib/python2.7/site-packages/requests-2.3.0-py2.7.egg/requests/adapters.py", line 375, in send
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='r008.produktion.local', port=50000): Max retries exceeded with url: /api/v1/status (Caused by <class 'socket.error'>: [Errno 111] Connection refused)

I haven't figured out exactly which deferred it is that's causing the error, and have also not been able to reproduce it in a reduced test case.
The provided backtrace could be completely misleading. The part about Connection refused when connecting to itself for example should not even be possible after the initial node startup, because at no point after that is such a connection even attempted.

This doesn't really break anything at the moment, but it should still not happen and might be indicative of a bigger problem somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant