Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove greenlet from asr1kl3 agent, use python threads instead #63

Open
wants to merge 4 commits into
base: stable/ussuri-m3
Choose a base branch
from

Conversation

notandy
Copy link
Collaborator

@notandy notandy commented Jul 21, 2021

This patch removes greenlet from asr1kl3 agent by stripping eventlet
(and thus monkey_patching), and replacing following (eventlet) based modules
with python-thread based:

neutron.Manager -> embedded into main class L3ASRAgentWithStateReport
oslo-service.looping_call -> futurist.periodics
oslo-service.service -> python Thread with graceful_exit signal handler
grenlet.greenpool -> futurist.ThreadPoolExecutor
neutron.rpc -> oslo_messaging low level api (to switch it to threading)

Why?

Removing eventlet and replacing it with python.threading has the huge benefit that
external libraries as ncclient doesn't need to be patched or prepared for eventlet.
For a single-process and single worker-queue based agent python threading performance
should be more than sufficient.

Fixes

ncclient (upstream version, no more double socket exceptions)
graceful shutdown (all threads are gracefully stopped)

Reports

Also, there is now a regular thread report that can be manually triggered via
kill -SIGUSR2 <pid>:

Currently running 8 periodic workers:
+----------------------------------------------------------------------------------------------------+--------+-------------+-----------+------+----------+-----------+----------------+-----------------+-------------------------+
|                                                Name                                                | Active | Periodicity |  Runs in  | Runs | Failures | Successes | Stop Requested | Average elapsed | Average elapsed waiting |
+----------------------------------------------------------------------------------------------------+--------+-------------+-----------+------+----------+-----------+----------------+-----------------+-------------------------+
|     asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent.periodic_requeue_routers_task     | False  |      60     |  30.6859s |  1   |    0     |     1     |     False      |     0.0002s     |         0.0007s         |
| asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent.periodic_refresh_address_scope_config | False  |      5      |  0.6895s  |  5   |    0     |     5     |     False      |     0.0621s     |         0.0002s         |
|        asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent.periodic_worker_report         | False  |     600     | 570.6783s |  0   |    0     |     0     |     False      |        .        |            .            |
|     asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgentWithStateReport._report_state      | False  |     30.0    |  0.6798s  |  0   |    0     |     0     |     False      |        .        |            .            |
|      asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent._periodic_sync_routers_task      | False  |     120     |  91.3323s |  0   |    0     |     0     |     False      |        .        |            .            |
|        asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent._periodic_scavenge_task        | False  |     120     |  91.3325s |  0   |    0     |     0     |     False      |        .        |            .            |
|         asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent._check_devices_alive          | False  |     60.0    |  31.3325s |  0   |    0     |     0     |     False      |        .        |            .            |
|             asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent._clean_device             | False  |     600     | 571.7885s |  0   |    0     |     0     |     False      |        .        |            .            |
+----------------------------------------------------------------------------------------------------+--------+-------------+-----------+------+----------+-----------+----------------+-----------------+-------------------------+
Currently running 11 threads: ['MainThread', 'pydevd.Writer', 'pydevd.Reader', 'pydevd.CommandThread', 'pydevd.CheckAliveThread', 'Thread-7', 'ASR1k Worker Thread', 'Thread-8', 'Rabbit-heartbeat', 'Thread-10', 'Thread-11']

This patch removes greenlet from asr1kl3 agent by stripping eventlet
(and thus monkey_patching), and replacing following (eventlet) based modules
with python-thread based:

neutron.Manager -> embedded into main class L3ASRAgentWithStateReport
oslo-service.looping_call -> futurist.periodics
oslo-service.service -> python Thread with graceful_exit signal handler
grenlet.greenpool -> futurist.ThreadPoolExecutor
neutron.rpc -> oslo_messaging low level api (to switch it to threading)

Why?
----
Removing eventlet and replacing it with python.threading has the huge benefit that
external libraries as ncclient doesn't need to be patched or prepared for eventlet.
For a single-process and single worker-queue based agent python threading performance
should be more than sufficient.

Fixes
-----
ncclient (upstream version, no more double socket exceptions)
graceful shutdown (all threads are gracefully stopped)

Reports
-------
Also, there is now a regular thread report that can be manually triggered via
`kill -SIGUSR2 <pid>`:

```
Currently running 8 periodic workers:
+----------------------------------------------------------------------------------------------------+--------+-------------+-----------+------+----------+-----------+----------------+-----------------+-------------------------+
|                                                Name                                                | Active | Periodicity |  Runs in  | Runs | Failures | Successes | Stop Requested | Average elapsed | Average elapsed waiting |
+----------------------------------------------------------------------------------------------------+--------+-------------+-----------+------+----------+-----------+----------------+-----------------+-------------------------+
|     asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent.periodic_requeue_routers_task     | False  |      60     |  30.6859s |  1   |    0     |     1     |     False      |     0.0002s     |         0.0007s         |
| asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent.periodic_refresh_address_scope_config | False  |      5      |  0.6895s  |  5   |    0     |     5     |     False      |     0.0621s     |         0.0002s         |
|        asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent.periodic_worker_report         | False  |     600     | 570.6783s |  0   |    0     |     0     |     False      |        .        |            .            |
|     asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgentWithStateReport._report_state      | False  |     30.0    |  0.6798s  |  0   |    0     |     0     |     False      |        .        |            .            |
|      asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent._periodic_sync_routers_task      | False  |     120     |  91.3323s |  0   |    0     |     0     |     False      |        .        |            .            |
|        asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent._periodic_scavenge_task        | False  |     120     |  91.3325s |  0   |    0     |     0     |     False      |        .        |            .            |
|         asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent._check_devices_alive          | False  |     60.0    |  31.3325s |  0   |    0     |     0     |     False      |        .        |            .            |
|             asr1k_neutron_l3.plugins.l3.agents.asr1k_l3_agent.L3ASRAgent._clean_device             | False  |     600     | 571.7885s |  0   |    0     |     0     |     False      |        .        |            .            |
+----------------------------------------------------------------------------------------------------+--------+-------------+-----------+------+----------+-----------+----------------+-----------------+-------------------------+
Currently running 11 threads: ['MainThread', 'pydevd.Writer', 'pydevd.Reader', 'pydevd.CommandThread', 'pydevd.CheckAliveThread', 'Thread-7', 'ASR1k Worker Thread', 'Thread-8', 'Rabbit-heartbeat', 'Thread-10', 'Thread-11']
```
@joker-at-work
Copy link

Will we install gdb with python extension by default, so we can get some kind of backdoor or is there a console-thread started, too?

@notandy
Copy link
Collaborator Author

notandy commented Jul 22, 2021

Will we install gdb with python extension by default, so we can get some kind of backdoor or is there a console-thread started, too?

Good question, I don't think python provides something similar by default, maybe there is a similar library out there? I will look into installing gdb-python via loci pipeline, which also make sense for all non-eventlet openstack projects.

@notandy
Copy link
Collaborator Author

notandy commented Jul 22, 2021

https://pypi.org/project/manhole/ looks promising

@notandy
Copy link
Collaborator Author

notandy commented Jul 22, 2021

Just evaluated backdoor in qa-de-2:

> hammer k backdoor neutron-asr1k-agent-01-5b68cb88fc-qsm65
Select a container:

  ▶ neutron-asr1k
    neutron-asr1k-ml2
/var/lib/neutron/eventlet_backdoor-1.socket
Calling  /usr/bin/socat -d -d TCP-LISTEN:33442,bind=localhost unix-connect:/var/lib/neutron/eventlet_backdoor-1.socket
Executing rlwrap --histsize 2147483647 socat stdio TCP:localhost:54985

######### ProcessID=1, ThreadID=139641907050240 #########
... (manhole prints a stacktrace of all threads on startup)


Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(ManholeConsole)
>>>

Copy link

@velp velp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good generally, but there are a few notes on the readability of the code.

Copy link

@velp velp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -338,63 +327,70 @@ def __init__(self, host, conf=None):
# continue
# break
self.monitor = self._initialize_monitor()
self.worker = periodics.PeriodicWorker([

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use PeriodicWorker.create() here, but instantiate the worker directly?

Wouldn't it make sense to create a worker with the ThreadPoolExecutor instead of the default SynchronousExecutor? As I understand it, the previous periodic-tasks would be able to run in parallel so this would better mimic previous behavior.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking further into it, neutron.service.Service starts 2 loopingcalls in start(): one for reporting state and one for running the periodic tasks synchronous. This makes sure that we always report state, even if some periodic tasks take some time. Therefore, the default SynchronousExecutor might be fine here, but we cannot have the self._report_state() handled by self.worker, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants