Kafka

Apache Kafka is a distributed streaming system that can publish and subscribe to a stream of records. In another aspect, it is an enterprise messaging system. It is a highly fast, horizontally scalable, and fault-tolerant system. Kafka has 4 core APIs

Producer API
Consumer API
Streams API
Connector API

Apache Kafka uses Apache Zookeeper to maintain and coordinate the Apache Kafka brokers. A version of Apache Zookeeper is bundled with Apache Kafka.

Use cases

Kafka is used for the below use cases:

Messaging System Kafka is used as an enterprise messaging system to decouple source and target systems to exchange data. Kafka provides high throughput with partitions and fault tolerance with replication in comparison to JMS.
Web Activity Tracking This is done to track user journey events on the website for analytics and offline data processing.
Log Aggregation This processes the log from various systems, especially in distributed environments with microservices architectures in which the systems are deployed on various hosts. We need to aggregate the logs from various systems and make the logs available in a central place for analysis.
Metrics Collector Kafka is used to collecting metrics from various systems and networks for operations monitoring. There are Kafka metrics reporters available for monitoring tools like Ganglia, Graphite, etc.

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of some of its components.The Java Message Service (JMS) API is a messaging standard that allows application components based on the Java Platform Enterprise Edition (Java EE) to create, send, receive, and read messages. It enables distributed communication that is loosely coupled, reliable, and asynchronous.

Broker An instance in a Kafka cluster is called a broker.

In a Kafka cluster, if you connect to a broker, you will be able to access the entire cluster. The broker instance that we connect to in order to access the cluster is known as a bootstrap server. Each broker is identified by a numeric ID in the cluster. To start a Kafka cluster, three brokers is a good number, but there are clusters with hundreds of brokers. A topic is a logical name to which the records are published. Internally, the topic is divided into partitions to which the data is published. These partitions are distributed across the brokers in the cluster.
For example, if a topic has three partitions with three brokers in the cluster, each broker has one partition. The published data to partition is append-only with the offset increment.

Topics are identified by name. We can have many named topics in a cluster.
The order of messages is maintained at the partition level, not across topics.
Once the data written to the partition, it is not overridden. This is called immutability.
The messages in partitions are stored with keys, values, and timestamps. Kafka ensures publishing the message to the same partition for a given key.
From the Kafka cluster, each partition will have a leader that will take read/write operations to that partition.