Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't Recover From NotLeaderForPartitionError #1018

Open
oferda4 opened this issue Jun 26, 2024 · 0 comments
Open

Can't Recover From NotLeaderForPartitionError #1018

oferda4 opened this issue Jun 26, 2024 · 0 comments

Comments

@oferda4
Copy link
Contributor

oferda4 commented Jun 26, 2024

Describe the bug
Hi all,
We use Aivens's kafka servers and aiokafka for the client side.

When getting NotLeaderForPartitionError error (documented here), although the error is define invalid_metadata = True and it indeed rerequest the metadata (as it should), the error is keep raising.

We also tried to stop the current producer and create a new object every time this exception is raised. Doing so, the new producer seems to work properly, however the error is keep being printed - Got error produce response on topic-partition TopicPartition(topic='XXXXX', partition=X), retrying. Error: <class 'aiokafka.errors.NotLeaderForPartitionError'>. Also according to the network usage the metadata is keep being asked.
The error is printed until the process is shutdown (which can take days).

According to the server's logs, it behaves properly - changing leader only once in a while (when those errors are starting) but not keep changing it.

Expected behaviour

  • The producer should recover from this error after requesting the metadata once.
  • Even when not recovering, a producer that had been stopped shouldn't leak any failing attempt.
  • Even after leaked, the failing batches should expire after the timeout.

Environment (please complete the following information):

  • aiokafka version: 0.10.0
  • Kafka Broker version: 3.6.2

Reproducible example
The producer is created with the following way:

aiokafka.AIOKafkaProducer(
    "bootstrap_servers": "my_server:9092",
    "request_timeout_ms": 60000,
    "linger_ms": 0,
    "compression_type": None,
    "max_batch_size": 16000,
    "max_request_size": humanfriendly.parse_size("20MiB"),
    "acks": 1,
)

Actually reproducing is difficult, as you must have a setup with Aiven, then you need to make it change the leader and also that not always causes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant