Long-lived AMQP connection #107

SwooshyCueb · 2022-09-26T18:12:36Z

At present, we make a new connection for every AMQP message we send. This is not ideal.

From the RabbitMQ documentation (emphasis theirs):

Before an application can use RabbitMQ, it has to open a connection to a RabbitMQ node. The connection then will be used to perform all subsequent operations. Connections are meant to be long-lived. Opening a connection for every operation (e.g. publishing a message) would be very inefficient and is highly discouraged.

This comes from AMQP 0-9-1-centric and .NET-centric documentation, but it still applies for us.

trel · 2022-09-26T18:15:55Z

Very interesting ... and kind of obvious now that it's linked :)

Suggests the server has to hold a connection longer-lived than an individual Agent.

Hmm.....

SwooshyCueb · 2022-09-26T18:19:43Z

I've been pondering this for a bit, actually. Was something I thought about working on as part of #105, but there's already so much stuff in there that I think I'd rather get a bow on it and do further refactoring of our AMQP/Proton usage as part of a separate effort.

korydraughn · 2022-09-26T18:22:33Z

I agree. Handling that in a separate PR feels like the right thing to do.

korydraughn · 2022-09-26T20:54:12Z

Potential Solution

Instead of sending one AMQP message per RabbitMQ connection, we can store the messages and send them at a later time. This enables the iRODS server to batch messages. This also means the server is less likely to lose messages if they were written down successfully.

Plugin Configuration

The audit plugin would grow two new configuration options:

shared_memory_size_in_bytes: The size of the shared memory buffer used to hold audit information
- This would be allocated on plugin start and shared between all agents
- Does it make sense to abstract the storage space used to hold audit information?
flush_interval_in_seconds: The amount of time before an agent attempts to send the messages stored in the shared memory buffer

High-Level Algorithm

The audit plugin would be changed to do the following:

Attempt to write the audit information into the pre-allocated shared memory buffer.
If inserting the audit information exceeds the shared_memory_size_in_bytes, flush the messages.
If the message was inserted without error, check the flush_interval_in_seconds.
If the flush_interval_in_seconds has been satisfied, flush the messages.
If the flush_interval_in_seconds has not been satisfied, do nothing.

Flush the messages means ... opening and using a single RabbitMQ connection to send all messages stored in shared memory. Only one agent is allowed to do this.

This solution assumes access to the shared memory is synchronized across agents running on the same machine. This solution does not require zone-wide synchronization.

alanking · 2022-09-27T12:41:13Z

Would the messages also flush when the plugin's stop operation is invoked? Or is this being managed outside of the agents? "The server" terminology is tripping me up, I think.

korydraughn · 2022-09-27T13:04:50Z

Would the messages also flush when the plugin's stop operation is invoked?

That's a possibility. We'd need to investigate and see what the pros/cons are around the stop operation.

Or is this being managed outside of the agents?

It is all happening inside the agent. The good thing about this design is that it's flexible. The audit plugin could simply write things down and let another tool handle sending the messages. The design presented just makes it so that no other tool is necessary.

trel · 2022-09-27T13:05:08Z

Have to make sure we write down the new messages in the full buffer case... so...

flush_messages {
    send_all_messages_in_single_connection()
    set_last_flush_time()
}

# check the flush_interval_in_seconds
if enough seconds have passed:
    flush_messages()

# save new message
message_saved = 0
do while message_saved != 1:
    if !(save_message_to_buffer):
        flush_messages()

alanking · 2022-09-27T14:19:53Z

Oh, I see. The flushing is actually managed in the shared memory in addition to the messages to flush. Mental model was missing some screws, as per usual.

SwooshyCueb · 2022-10-04T19:20:38Z

I am personally not a fan of batching messages in this way. To me it feels like a janky, overcomplicated workaround for our ownership model.
Instead, I propose we have a single connection per agent, opened in start(), reused in exec_rule(), and closed in stop().

trel · 2022-10-04T20:41:01Z

That seems reasonable and good.

And we'll still see a performance gain since an Agent can send hundreds of messages if all PEPs are configured to fire.

Will be fun to measure and compare.

wijnanjo · 2024-06-17T12:59:46Z

We adapted the irods plugin to send messages to kafka instead of over AMQP. Now we see a pretty high rate of new connections to kafka (coming from the plugin) which decreases the performance of our kafka broker.

Is it safe to create a singleton kafka client in the plugin (so a single client per irods process instead of per agent)? And if yes, how exactly? We're lacking C++ experience, so any help is very welcome.

Using a long-lived kafka client is a best practice, just like AMQP connections and the kafka client already takes care of message housekeeping like buffering and flushing.

korydraughn · 2024-06-19T12:53:34Z

If I'm understanding correctly, you're asking if a single kafka client can be per iRODS server rather than per agent. Does that sound correct?

If yes, then iRODS doesn't provide any mechanism for doing that, yet. There are ways around that limitation though. Without going into detail, here is one way to deal with it.

Add more indirection: iRODS agent -> proxy -> kafka server
- Add a proxy server that maintains a kafka client. Perhaps written in python.
- Modify the audit plugin to push messages to the proxy server.
- Proxy server just accepts/sends messages to Kafka server.
Or, implement the solution at Long-lived AMQP connection #107 (comment).
- While each agent would still get its own kafka client, performance should improve due to the agents reusing their existing kafka client.

With that said, both of those solutions could prove challenging depending on your C++ experience.

wijnanjo · 2024-06-20T06:58:33Z

If I'm understanding correctly, you're asking if a single kafka client can be per iRODS server rather than per agent. Does that sound correct?

That's correct and pretty important since the irods server spawns a lot of agents (mainly due to Globus transfers creating a connection for each file to be transferred, if I'm not mistaken). We already implemented your second suggestion (that is created kafka client in plugin.start and destoy in plugin.stop). But that only partly solves our issue.

We'll go for a proxy solution.

trel · 2024-06-20T11:25:34Z

Hi @wijnanjo, would love to know where you are doing this work. Please consider making contact at [email protected] if mentioning here is not desired. Thanks.

trel · 2024-06-26T14:36:25Z

new 5.0 process model

main irodsServer only has children with jobs, one would be a buffer/proxy for this purpose
would launch a long-running proxy process
- holds series of messages - perhaps message-type agnostic?
- IPC between agent and long-running process - so no copying when it's time to 'send'
- kafka can't just be a series of bytes - has to deserialize the json to see the key/pid-host-zone 'group' information for grouping AND ordering
  - suggests different flavors of 'proxy' based on message technology / protocol
would need 5.0 first

or

a completely separate binary/service (k8s?)

not handled by grandpa? could be python / etc.
similar to https://github.com/yumok/amqp-kafka-bridge
could be done today by administrators

very interested in any performance / throughput numbers with today's various solutions

korydraughn · 2024-10-16T13:15:34Z

@wijnanjo Any updates on the proxy solution?

wijnanjo · 2024-10-21T09:12:48Z

Hi @korydraughn, we haven't changed our current solution, apart from some configuration tweaks. Just last week I noticed in our logs that we are loosing a significant number of kafka auditmessages and this typically happens in auditplugin.stop (timeouts when we flush and close the kafka producer). The root cause is yet unknown (maybe too much load on kafka or the network ?). Those errors occur only now and then but it adds up over time. So, a solution as outlined above by @trel is still urgent.
And so I'm trying the following:

define an API in protobuf to export auditrecords. I call this the collector-API (I stole the idea from OpenTelemetry project where they basically solved a similar issue)
run a gRPC server that implements this 'collector API'. In my case this collector will export the auditrecords to kafka. The official kafkaclient does batching and retries out-of-the box and now it'll be a long living instance, thus we don't have to call that dreaded 'flush' call anymore.
the auditplugin becomes a gRPC client, sending records to the collector.

This designs offers flexibility:

gRPC is fast and can use tcp or unix domain sockets
we can deploy the collector server as a separate (loadbalanced) service/container (gRPC over tcp) or as a single long running irods process (in 5.0 and gRPC over a socket).
users can create their own collector in their language of choice and send the data to whatever they like.
a minimal solution could even work without a gRPC server. In this case the auditplugin would call a 'local' collector: simply a class that implements the collector API and directly sends over AMQP, kinda like the current solution.

Downside is (slightly) increased complexity and less resilience (another component that may fail). A specific concern for us (related to ordering gurantee of audit messages in kafka) is that we must send all messages from a particular irods process to the same kafka producer (ie collector) as this preservers ordering of messages in kafka.

I can do the gRPC server in Go but I might ask your help for the C++ part. We'll see how it works out.

trel · 2024-10-21T13:26:22Z

So we're leaning towards a 5.0 server with a long-running child acting as proxy-broker / passthrough / collector. ( #107 (comment) ). Thank you @wijnanjo for thinking through the scenarios.

We need to consider local memory usage when the 'real' broker is not online / connected. Have to decide whether to drop any outgoing messages that do not fit in the buffer on the ground... or provide some kind of backlog and persistence (to disk?)... and then what happens on restart of the iRODS server and/or the proxy process? If there is a memory buffer, and the proxy crashes... those buffered in-memory messages will be lost. What guarantees are we claiming to meet?

The current setup requires an admin to immediately fix the broker and there is no buffer to manage. No messages go out because having a live connection is a prerequisite of the iRODS server doing any work. This was a design decision early on so that managing / losing / holding messages could not be the responsibility / fault of the iRODS server and its AMQP plugin.

Going in this new direction would shift that responsibility to the iRODS server and its new proxy collector / service.

I think that's okay and good - just want to make sure we're enumerating the differences and the impacts on expectations.

wijnanjo · 2024-10-21T14:44:01Z

No messages go out because having a live connection is a prerequisite of the iRODS server doing any work

That's a valid concern. A simple collector could still send the payload immediately to the broker and ack on each request; that would give the same behavior as the current solution.

Our "kafka flavour" of the collector would initially not provide these guarantees (instead it's expected to work very performant). But we could enhance it by persisting every message that is 'on its way to kafka' (=either in a memory queue of the kafkaclient either in flight) to disk, then ack'ing, and later on removing the message from disk when we have an ack from kafka. Or in case of broker is down, recover the messages from disk and retry.

korydraughn · 2024-10-25T22:20:03Z

I can do the gRPC server in Go but I might ask your help for the C++ part. We'll see how it works out.

@wijnanjo Sounds good. Happy to assist with the C++ part.

What I'm about to say is similar to what's been said already. Consider the following:

The audit plugin can write generic messages to an abstract storage device. The abstract storage device could represent a memory buffer, a disk, a database, or anything else you can think of. The important thing to keep in mind is:

The audit plugin doesn't need to know about kafka or amqp
The audit plugin just writes the messages down as fast as possible
- No network required (unless the abstract storage device uses one internally)

From there, a process hanging off of the iRODS server or an external application could pull messages from the abstract storage device, convert the messages to the target format, and send them to their final destination (e.g. kafka).

We'd include an abstract storage device implementation and binary for AMQP with the audit plugin. The two components would serve as the reference for other implementations.

Questions for @wijnanjo:

How frequently is the audit information accessed?
When accessing the audit information, how up-to-date should it be?

SwooshyCueb mentioned this issue Sep 26, 2022

HA rabbit #106

Open

alanking mentioned this issue Jun 7, 2023

Java heap space runs out while running tests #119

Open

korydraughn mentioned this issue May 1, 2024

Make testing plugin against test_resource_types.Test_Unixfilesystem_Resource more resilient #150

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long-lived AMQP connection #107

Long-lived AMQP connection #107

SwooshyCueb commented Sep 26, 2022

trel commented Sep 26, 2022

SwooshyCueb commented Sep 26, 2022

korydraughn commented Sep 26, 2022

korydraughn commented Sep 26, 2022

alanking commented Sep 27, 2022 •

edited

Loading

korydraughn commented Sep 27, 2022

trel commented Sep 27, 2022

alanking commented Sep 27, 2022

SwooshyCueb commented Oct 4, 2022

trel commented Oct 4, 2022

wijnanjo commented Jun 17, 2024

korydraughn commented Jun 19, 2024

wijnanjo commented Jun 20, 2024

trel commented Jun 20, 2024

trel commented Jun 26, 2024

korydraughn commented Oct 16, 2024

wijnanjo commented Oct 21, 2024

trel commented Oct 21, 2024

wijnanjo commented Oct 21, 2024

korydraughn commented Oct 25, 2024

Long-lived AMQP connection #107

Long-lived AMQP connection #107

Comments

SwooshyCueb commented Sep 26, 2022

trel commented Sep 26, 2022

SwooshyCueb commented Sep 26, 2022

korydraughn commented Sep 26, 2022

korydraughn commented Sep 26, 2022

Potential Solution

Plugin Configuration

High-Level Algorithm

alanking commented Sep 27, 2022 • edited Loading

korydraughn commented Sep 27, 2022

trel commented Sep 27, 2022

alanking commented Sep 27, 2022

SwooshyCueb commented Oct 4, 2022

trel commented Oct 4, 2022

wijnanjo commented Jun 17, 2024

korydraughn commented Jun 19, 2024

wijnanjo commented Jun 20, 2024

trel commented Jun 20, 2024

trel commented Jun 26, 2024

korydraughn commented Oct 16, 2024

wijnanjo commented Oct 21, 2024

trel commented Oct 21, 2024

wijnanjo commented Oct 21, 2024

korydraughn commented Oct 25, 2024

alanking commented Sep 27, 2022 •

edited

Loading