What is KAFKA and Describe KAFKA Architecture

What is kafka?

Kafka is a open source and distributed Messaging streaming platform that uses publish and subscribe mechanism to streams the records. Currently used by many companies like linkedIn, Netflix, uber and Waltmart.

Originally developed bu LinkedIn and donated to the apache foundation.

Always be in mind during the setup:-

Data format

Connection Type (http, tcp, jdcp)

Schema… etc…

Kafka Architecture

Producer: – Producer is that who produce or publish the data. Producers are applications which write/publish data to the topics within a cluster using the producer API’s. Producer can write the data at topic level or partition level.

Consumer:- Consumer is that who consume or subscribe the data. Consumer are those applications which consume the data from the topics. Consumers are always associated with only one consumer group.

Consumer Group:- Consumer group is a group of related consumers that perform a task.

Broker:- Broker is a term for node means we have five instances or nodes in one cluster so we have 5 brokers. 

Topics:- A stream of messages belonging to a particular category is called topic. It is a logical feed name to publish the records.

Partitions:- Topics are split into the partitions, like we have one topics and split  into 4 partitions. All the messages within a partition are ordered and immutable. Each message within a partition has a unique id associated known as offset.

Offsets are three types:-

  • Log end offset (last message number which is published by the producer)
  • current offset (how many messages consumed)
  • commit offset (last acknw. offset number)

KAFKA Features:-

  • Scalabe:- Horizontal scaling is done by adding new brokers to the existing clusters.
  • Fault tolerance:- Kafka cluster can handle failures because of its distributed nature.
  • Durable:- Kafka uses “Distributed commit logs” which means message persists on disk as fast as possible.
  • Performance:- Kafka has high throughput for both publishing and subscribing messages.
  • No data loss:- It ensures no data loss if we configure it properly.
  • Zero Down Time:- It ensures zero downtime when required number of brokers are present in the cluster.
  • Reliability:- Kafka is reliable because it provides above features.

KAFKA API’s :-

  • Producer API
  • Consumer API  (consumer.poll()   and consumer.commit()
  • Stream API
  • Connector API
  • Admin API

Consumer Group Rebalancing

The process of re-distributing partitions to the consumer within the same consumer group is known as consumer group rebalancing.

Rebalancing of a consumer group happens in below cases:

  1. A consumer joining the group.
  2. A consumer leaving the group.
  3. If partition is added to the topics.
  4. If a partition goes in offline state.

How consumer group rebalancing is work?

Consumer group rebalancing handles by the two entity( group co-ordinator and group leader).

What is Zookeeper?

Zookeeper is used to monitor the kafka cluster and co-ordinate with each broker to check the health status of the broker. Keeps all the metadata information related to the kafka cluster in the form of key-value pair.

Metadata includes:-

  • Configuration information
  • Health status of broker.

It is used for the controller election within the kafka cluster.

A set of zookeeper nodes working together to manage other distributed systems known as Zookeeper cluster or “Zookeeper Ensemble”.

Leave a Reply

Your email address will not be published. Required fields are marked *