Bookmark this page

Configuring Producer Retries and Idempotence

Objectives

After completing this section, you should be able to configure producer retries and idempotence.

Defining Idempotence

Idempotence is the property of an algorithm that generates the same result for identical calls, regardless of how many times the algorithm is called.

When an operation is not idempotent, multiple, identical calls might generate different results. For example, consider a function that adds two numbers, addNumbers(int number, int number2). This function must be idempotent because the same input numbers, must always return the same result.

Understanding Retries

In the context of EDAs, retrying means re sending an event because you have no confirmation of its delivery. This can happen for several reasons such as networking issues or an incorrect configuration. Retries are a common technique to deal with errors in resilient applications.

In Apache Kafka, retries are performed by producers that are responsible for sending records. You can retry sending the same record several times, depending on the producer configuration and business logic of your application. In versions previous to Kafka 2.0 the default retry number is 0, therefore, no retries are performed. From Kafka 2.1 onward, the number of retries defaults to 2147483647.

You can change the number of retries for every record by setting the retries property of the producer. You can specify the time between every retry by setting the retry.backoff.ms property. For example, if retries = 3 and retry.backoff.ms = 5000, then every record is retried three times with five seconds of waiting time between each retry.

The producer has several configuration properties that enable you to achieve different record delivery scenarios, depending on your application needs. The following image illustrates the different phases of the producer, and where every property is applied.

Figure 6.1: Producer phases
PropertyPhaseDescription
linger.ms BatchingThe amount of time, in milliseconds, that the producer waits to send the batch of records to the broker. Records are sent by a producer in batches. When linger.ms = 0, the records are sent as soon as possible. In high traffic environments, you might choose to wait for the batch to contain more records before it is sent. This reduces the number of requests to the broker, and every request contains more records.
retry.backoff.ms RetryingThe waiting time, in milliseconds, between every retry.
request.timeout.ms Broker connectionThe maximum amount of time, in milliseconds, that the client waits for a response from the broker.
delivery.timeout.ms Batching, Waiting for broker readiness, Retrying, Broker connectionThe maximum amount of time, in milliseconds, that the client waits to perform the batching, waiting for broker readiness, retrying and broker connection phases.

In the Waiting for broker readiness phase, records wait for the broker to be ready.

At the same time, you should also consider that retrying might lead to events received in a different order than they were sent. For example, consider a producer that fails to send an event, M1, and it sends the event again after 10 seconds. If another record, M2, is sent during that period of time, then M2 is stored before M1. In Kafka, if retries are enabled and the property max.in.flight.requests.per.connection > 1, then records might lose their original order.

Understanding the Producer Delivery Semantics

The message delivery semantics are the guarantees that Kafka provides for the delivery of a record. There are three message semantics:

  • At most once: guarantees that a record is never duplicated, but might be lost.

  • At least once: guarantees that a record is never lost, but it might be duplicated.

  • Exactly once: guarantees that a record is never lost, and it is never duplicated.

Depending on the use case of your application, you might choose one over the other. For example, if losing records is acceptable for your application, then you might choose at most once. Most applications, however, can not allow lost or duplicated records, and should implement an exactly once delivery semantic.

Understanding Duplication Scenarios

In EDAs, and in any other application that uses Kafka, re sending an event might duplicate that event. For example, consider an event UserCreated that contains information about the creation of a user. If an event for the same user is sent twice, then Kafka stores a duplicate event.

Duplicates in Producer Acknowledgments

Producers can use the acks configuration property to receive acknowledgments from the Kafka broker. This property can be set to three values:

  • 0: the producer does not wait for an acknowledgment. The record is always considered successfully sent.

  • 1: the producer waits for the acknowledgment of the partition leader, but does not wait for the message to be saved across the partition replicas.

  • all: the producer waits for the message to be saved across all partition replicas before it is considered successfully sent.

If you enable acknowledgments (i.e. you set the acks property to 1 or all) then an error while receiving acknowledgment, such as a network issue, can lead to a duplicate event.

For example, consider the following diagram, where the producer sends two records. The broker stores the first one, M1, in the corresponding partition, and the producer receives an acknowledgment response. The broker stores the second one, M2, in the corresponding partition, but as a result of a network issue, the producer does not receive the acknowledgment response. Therefore, the producer retries sending M2 and the record is duplicated.

Figure 6.2: Record duplication as a result of a missing ACK response

Business-logic Duplication

The business-logic of your own application can also lead to duplicated events. The lack of validations might lead to duplicated events as a result of the user clicking the same button or component several times.

For example, consider a restaurant application where users write reviews. There is a bug in the UI that shows success messages from the back-end as errors. If the user receives an error message after sending the form, then the user tries to send it again. The back-end has processed the form successfully the first time, but as a result of the bug, the user has sent it again. If you do not perform extra verification's in the back-end (for example, every user can only have one review for each restaurant), then the review is duplicated.

Preventing Duplication with Kafka

Kafka includes a mechanism to prevent duplicates caused by producer retries. You can enable this feature by setting the enable.idempotence property to true. This property ensures idempotency because if you send the same record ten times, then the valid result is always the first record that was stored in Kafka. The broker discards any duplicated records. It also ensures the record order within every partition.

Every producer is assigned a producer number, and every record is assigned an incremental sequence number. The producer keeps a different sequence for every partition that it sends records to. On the broker side, the largest sequence number for every partition is kept and any record with a lower or equal sequence number is discarded. If you enable this property, other configurations are required:

  • max.in.flight.requests.per.connection must be less than or equal to 5 (max.in.flight.requests.per.connection < = 5)

  • retries must be greater than 0 (retries > 0)

  • acks must be set to all (acks = all)

Otherwise, an exception is thrown.

Consider the example from the Duplicates in Producer Acknowledgments section, where the producer sends two messages (M1 and M2) to the same partition. By enabling enable.idempotence, the retry of M2 is discarded by the broker. The following diagram illustrates this behavior.

Figure 6.3: Record is not duplicated because idempotency is enabled
  1. M1 receives the sequence number 1, and the broker stores M1 in the partition. The largest sequence number for that partition is 1.

  2. M2 receives the sequence number 2, and the broker saves M2 in the partition, however, the acknowledgment message is lost. The largest sequence number for that partition is 2.

  3. The producer retries sending M2 with the same sequence number, and the broker discards it because its sequence number is lower or equal to the largest partition number. The broker sends an ACK response to avoid more retries of the message.

Kafka solves the problem of duplicated events as a result of a retry, but it does not provide a built-in solution to prevent duplicates caused by the business logic of your application.

Revision: ad482-1.8-cc2ae1c