Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amqp reconnect on sending failure #429

Merged
merged 1 commit into from
Jan 23, 2025
Merged

Conversation

dgafka
Copy link
Member

@dgafka dgafka commented Jan 23, 2025

Why is this change proposed?

In relation to #428. To allow Amqp to recover, before retrying.

Description of Changes

Pull Request Contribution Terms

  • I have read and agree to the contribution terms outlined in CONTRIBUTING.

@dgafka dgafka merged commit e11cf91 into main Jan 23, 2025
67 checks passed
@dgafka dgafka deleted the fix/reconnect-on-sending-failure branch January 23, 2025 19:50
@lifinsky
Copy link
Contributor

@dgafka I will test it again and add new logs

@lifinsky
Copy link
Contributor

@dgafka Lost messages unfortunately remain after the update.

Message 1 (success publishing)

23:00:42 INFO      [ecotone] Collecting message with id: 052f0048-3be1-45e3-b685-e8d1d4c2feaa
[
  "message_id" => "052f0048-3be1-45e3-b685-e8d1d4c2feaa",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:00:42 INFO      [ecotone] Message was stored in deduplication table.
[
  "message_id" => "052f0048-3be1-45e3-b685-e8d1d4c2feaa",
  "consumer_endpoint_id" => "backoffice_db",
  "routing_slip" => "backoffice_internal,ticket.wasPrepared.target.execute"
]
23:00:42 INFO      [ecotone] Flushed and cleared doctrine object managers
[
  "message_id" => "052f0048-3be1-45e3-b685-e8d1d4c2feaa",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:00:42 INFO      [ecotone] Releasing collected 1 message(s) to send them to Message Channels
[
  "message_id" => "052f0048-3be1-45e3-b685-e8d1d4c2feaa",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:00:42 INFO      [ecotone] Database Transaction committed
[
  "message_id" => "052f0048-3be1-45e3-b685-e8d1d4c2feaa",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:00:42 INFO      [ecotone] Message with id `052f0048-3be1-45e3-b685-e8d1d4c2feaa` was acknowledged in Message Channel `backoffice_db`
[
  "message_id" => "052f0048-3be1-45e3-b685-e8d1d4c2feaa",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]

Message 2 (lost) - after heartbeat missing

23:04:32 INFO      [ecotone] Message with id `5245eaeb-4c9e-49c7-b573-45da532ed0b6` received from Message Channel `backoffice_db`
[
  "message_id" => "5245eaeb-4c9e-49c7-b573-45da532ed0b6",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:04:32 INFO      [ecotone] Database Transaction started
[
  "message_id" => "5245eaeb-4c9e-49c7-b573-45da532ed0b6",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:04:32 INFO      [ecotone] Collecting message with id: 5245eaeb-4c9e-49c7-b573-45da532ed0b6
[
  "message_id" => "5245eaeb-4c9e-49c7-b573-45da532ed0b6",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:04:32 INFO      [ecotone] Message was stored in deduplication table.
[
  "message_id" => "5245eaeb-4c9e-49c7-b573-45da532ed0b6",
  "consumer_endpoint_id" => "backoffice_db",
  "routing_slip" => "backoffice_internal,ticket.wasPrepared.target.execute"
]
23:04:32 INFO      [ecotone] Flushed and cleared doctrine object managers
[
  "message_id" => "5245eaeb-4c9e-49c7-b573-45da532ed0b6",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:04:32 INFO      [ecotone] Releasing collected 1 message(s) to send them to Message Channels
[
  "message_id" => "5245eaeb-4c9e-49c7-b573-45da532ed0b6",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:04:32 INFO      [ecotone] Database Transaction committed
[
  "message_id" => "5245eaeb-4c9e-49c7-b573-45da532ed0b6",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:04:32 INFO      [ecotone] Message with id `5245eaeb-4c9e-49c7-b573-45da532ed0b6` was acknowledged in Message Channel `backoffice_db`
[
  "message_id" => "5245eaeb-4c9e-49c7-b573-45da532ed0b6",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]

Message 3 (success publishing)

23:06:14 INFO      [ecotone] Message with id `e2a9e086-12a7-4e14-ad1b-df8ba5de9815` received from Message Channel `backoffice_db`
[
  "message_id" => "e2a9e086-12a7-4e14-ad1b-df8ba5de9815",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:06:14 INFO      [ecotone] Database Transaction started
[
  "message_id" => "e2a9e086-12a7-4e14-ad1b-df8ba5de9815",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:06:14 INFO      [ecotone] Collecting message with id: e2a9e086-12a7-4e14-ad1b-df8ba5de9815
[
  "message_id" => "e2a9e086-12a7-4e14-ad1b-df8ba5de9815",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:06:14 INFO      [ecotone] Message was stored in deduplication table.
[
  "message_id" => "e2a9e086-12a7-4e14-ad1b-df8ba5de9815",
  "consumer_endpoint_id" => "backoffice_db",
  "routing_slip" => "backoffice_internal,ticket.wasPrepared.target.execute"
]
23:06:14 INFO      [ecotone] Flushed and cleared doctrine object managers
[
  "message_id" => "e2a9e086-12a7-4e14-ad1b-df8ba5de9815",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:06:14 INFO      [ecotone] Releasing collected 1 message(s) to send them to Message Channels
[
  "message_id" => "e2a9e086-12a7-4e14-ad1b-df8ba5de9815",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:06:14 INFO      [ecotone] Message was not sent to backoffice_internal due to exception. Trying to self-heal by doing retry attempt: 1/2
[
  "exception" => "connection closed unexpectedly",
  "relatedChannel" => "backoffice_internal"
]
23:06:14 INFO      [ecotone] Database Transaction committed
[
  "message_id" => "e2a9e086-12a7-4e14-ad1b-df8ba5de9815",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]
23:06:14 INFO      [ecotone] Message with id `e2a9e086-12a7-4e14-ad1b-df8ba5de9815` was acknowledged in Message Channel `backoffice_db`
[
  "message_id" => "e2a9e086-12a7-4e14-ad1b-df8ba5de9815",
  "correlation_id" => "8b9657cf-2f1f-4381-a4f7-a044e4a7bd05",
  "parent_id" => "23ffbb47-b21c-4ccf-81f2-71598f82d550"
]

@lifinsky
Copy link
Contributor

2025-01-23 23:35:17.307 [error] <0.936.0> closing AMQP connection <0.936.0> (172.21.0.9:36392 -> 172.21.0.7:5672):
missed heartbeats from client, timeout: 10s
2025-01-23 23:35:17.308 [info] <0.957.0> Closing all channels from connection '172.21.0.9:36392 -> 172.21.0.7:5672' because it has been closed

RabbitMQ does what Amqproxy and most likely AmazonMQ do not: Closing all channels from connection because it has been closed

@dgafka
Copy link
Member Author

dgafka commented Jan 24, 2025

Message 2 (lost) - after heartbeat missing

In this scenario no exception has been thrown at all.
I don't think then the PR in Symfony has solved it anyhow symfony/symfony#54167, it rather added retry policy which we already had in Ecotone for normal AMQP failure situations.

We need a way to check if connection isDisconnected properly before Message is sent. If so then kick off connection retry. Can you explore that?

@lifinsky
Copy link
Contributor

Message 2 (lost) - after heartbeat missing

In this scenario no exception has been thrown at all. I don't think then the PR in Symfony has solved it anyhow symfony/symfony#54167, it rather added retry policy which we already had in Ecotone for normal AMQP failure situations.

We need a way to check if connection isDisconnected properly before Message is sent. If so then kick off connection retry. Can you explore that?

Yes, I’m just now starting a deep study of the problem and will write test scripts with direct work with php amqp extension.

@lifinsky
Copy link
Contributor

#428 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants