Bookmark this page

Chapter 2. Introducing Kafka and AMQ Streams Concepts

Abstract

Goal Build applications with basic read and write messaging capabilities.
Objectives
  • Describe Kafka and AMQ Streams' history and use cases.

  • Describe the architecture of Kafka and AMQ Streams.

  • Create a topic with Kafka.

  • Send data with producers to topics.

  • Consume data from topics.

  • Define data contracts and integrate schema registries.

Sections
  • Describing Kafka and AMQ Streams (and Quiz)

  • Describing the Kafka Ecosystem and Architecture (and Quiz)

  • Creating Topics (and Guided Exercise)

  • Sending Data with Producers (and Guided Exercise)

  • Receiving Data with Consumers (and Guided Exercise)

  • Defining Data Formats and Structures (and Guided Exercise)

Lab

Introducing Kafka and AMQ Streams Concepts

Describing Kafka and AMQ Streams

Objectives

After completing this section, you should be able to describe Kafka and AMQ Streams' history and use cases.

Defining Apache Kafka

Apache Kafka is an open source distributed system composed of servers and clients that communicate through the TCP protocol. Kafka was created as a high-performance messaging system, and it is referred to as a distributed commit log system, or as a distributed streaming platform.

Kafka was initially developed at LinkedIn to rebuild the user activity tracking pipeline as a set of real-time publish/subscribe communication channels. It was designed as a high-performance messaging platform that handles the real-time data feeds that a large company might have. Kafka is used in some of the largest data pipelines in the world, including more than 80% of all Fortune 100 companies. It was released as an open source project in late 2010, it was proposed and accepted as an Apache Software Foundation incubator project in July 2011, and finally graduated from the incubator in October 2012.

Defining Red Hat AMQ Streams

The Red Hat AMQ Streams component is a scalable, distributed, and high-performance data streaming platform based on the Apache Kafka project. You can use AMQ Streams on Red Hat OpenShift, or on Red Hat Enterprise Linux.

At the core of AMQ Streams, Kafka provides:

  • A publish/subscribe messaging model, similar to a traditional enterprise messaging system.

  • A distributed system optimized for high throughput of messages and low latency.

  • A durable, distributed, and fault-tolerant storage of data.

  • The ability to replay streams of events.

  • The ability to scale horizontally as the data streams grow.

All of these capabilities make AMQ Streams suitable for event-driven and event sourcing architectures.

Use Cases of AMQ Streams and Apache Kafka

AMQ Streams provides a streaming platform that allows the exchange of data with high throughput and low latency. This is a benefit to many common use cases:

Messaging

Replaces traditional messaging systems such as Apache ActiveMQ or RabbitMQ. AMQ Streams was designed for fault tolerance and therefore provides strong durability and replication with better throughput as compared with traditional messaging systems. For example, you can use AMQ Streams as the communication layer in your event-driven application.

Stream Processing

Responds to real-time events by storing, aggregating, enriching, and processing data streams. For example, you can capture the interactions of the users of your website, analyze the behavior, and then build customer profiles that help you to increase sales.

Data Integration

Captures streams of events or data changes, and generates feeds compatible with other data systems to consume. For example, you can capture database changes in a monolithic application, and send events based on those changes without changing the application code.

Metrics

Aggregates statistics from the components of your distributed applications to produce centralized feeds of operational data. For example, you can send real-time metrics of the vehicles of your company to AMQ Streams, and then build an application that consumes all that data and generates a real-time status report.

Log Aggregation

Abstracts the file details and transforms them into a stream of data. It allows for easier support of multiple data sources, and allows the distribution of data consumption. For example, you can send the access logs of your Apache, and NGINX servers to AMQ Streams, and then build a monthly access report.

Revision: ad482-1.8-cc2ae1c