Kafka works well as a replacement for a more traditional message broker. Message brokers are used for various reasons, such as decoupling processing from data producers, buffering unprocessed messages, etc. Compared to most messaging systems, Kafka has better throughput, built-in partitioning, replication, and fault tolerance, making it a good solution for large-scale message processing applications. Messaging uses are often comparatively low throughput but may require low end-to-end latency and often depend on Kafka's strong durability guarantees. In this domain, Kafka is comparable to traditional messaging systems such asActiveMQorRabbitMQ.
The original use case for Kafka was to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. Site activities like page views, searches, or other user actions get published to central topics with one topic per activity type. These feeds are available for subscription for various use cases, including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting. Activity tracking is often very high in volume as many activity messages are generated for each page view
Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This capability allows for lower-latency processing and easy support for multiple data sources and distributed data consumption. Compared to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency.
It is common to use Kafka to process data in processing pipelines consisting of multiple stages. Raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing. For example, a processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an "articles" topic. Further processing might normalize or deduplicate this content and publish the cleansed article content to a new topic. A final processing stage might attempt to recommend this content to users. Such processing pipelines create graphs of real-time data flows based on particular topics.
Starting in version 0.10.0.0, a lightweight but powerful stream processing library calledKafka Streamsis available in Apache Kafka to perform such data processing as described above.
Event sourcingis a style of application design where state changes get logged as a time-ordered sequence of records. Kafka's support for huge stored log data makes it an excellent backend for an application built in this style.
Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. Thelog compactionfeature in Kafka helps support this usage.
Example use cases for Microservice+ Platform: Legacy applications refactoring and modernization Enterprises often rebuild their IT systems to change the functionality, add new features, move to the Cloud, or conduct global system modernization. A ...
Example use cases for StreamSets DC+ Platform Ingest and Transform Data in Any Cloud Modernize Data Lakes and Data Warehouses without hand-coding or special skills, and feed analytics platforms with continuous data from any source. StreamSets Data ...
Example use cases for Data Flow Platform Data Ingestion Stream Data Ingestion Ingest data in real-time as they arrive. Good for real-time data-driven decision processing for improving customer experience, minimizing fraud, and optimizing operations ...
Example use cases for Elastic Stack+ Platform Logging and Log Analysis The ecosystem built up around Elasticsearch has made it one of the easiest to implement and scale logging solutions. From Beats to Logstash to Ingest Nodes, Elasticsearch gives ...
Example use cases for Data Transformation Platform Data Streaming Streaming data is the continuous flow of data generated by various sources. The data streaming can be processed, stored, analyzed, and acted upon when generated in real-time. The data ...