datatrota
Signup Login
Home Jobs Blog

Apache Kafka Jobs in Lagos, Nigeria (Page 2)

View jobs that require Apache Kafka skill on TechTalentZone
  • Canonical logo

    Senior Software Architect, Commercial Systems

    CanonicalLagos, Nigeria04 January

    Canonical - We deliver open source to the world faster, more securely and more cost effectively than any other company. We develop Ubuntu, the world's most ...

    Remote
  • Flutterwave logo

    Data Engineer

    FlutterwaveLagos, Nigeria20 December, 2023

    Our mission is to power a new wave of prosperity across Africa. By enabling global digital payments on a continent that’s been largely cut off from the ...

    Onsite
  • Flutterwave logo

    Data Engineer

    FlutterwaveLagos, Nigeria14 November, 2023

    Our mission is to power a new wave of prosperity across Africa. By enabling global digital payments on a continent that’s been largely cut off from the ...

    Onsite

What is Apache Kafka?

Apache Kafka is a distributed data store optimized for ingesting and processing streaming data in real time. Streaming data is data that is continuously generated by thousands of data sources, which typically send the data records simultaneously. A streaming platform needs to handle this constant influx of data and process the data sequentially and incrementally.

Kafka provides three main functions to its users:

  • Publish and subscribe to streams of records

  • Effectively store streams of records in the order in which records were generated

  • Process streams of records in real-time

Kafka is primarily used to build real-time streaming data pipelines and applications that adapt to the data streams. It combines messaging, storage, and stream processing to allow storage and analysis of both historical and real-time data.

Developers can leverage these Kafka capabilities through four APIs:

  1. Producer API: This enables an application to publish a stream to a Kafka topic. A topic is a named log that stores the records in the order they occurred relative to one another. After a record is written to a topic, it can’t be altered or deleted; instead, it remains in the topic for a preconfigured amount of time—for example, for two days—or until storage space runs out.

  2. Consumer API: This enables an application to subscribe to one or more topics and to ingest and process the stream stored in the topic. It can work with records in the topic in real time, or it can ingest and process past records.

  3. Streams API: This builds on the Producer and Consumer APIs and adds complex processing capabilities that enable an application to perform continuous, front-to-back stream processing—specifically, to consume records from one or more topics, to analyze or aggregate or transform them as required, and to publish resulting streams to the same topics or other topics. While the Producer and Consumer APIs can be used for simple stream processing, it’s the Streams API that enables the development of more sophisticated data- and event-streaming applications.

  4. Connector API: This lets developers build connectors, which are reusable producers or consumers that simplify and automate the integration of a data source into a Kafka cluster.

How does Kafka work?

Kafka combines two messaging models, queuing and publish-subscribe, to provide the key benefits of each to consumers. Queuing allows for data processing to be distributed across many consumer instances, making it highly scalable. However, traditional queues aren’t multi-subscriber. The publish-subscribe approach is multi-subscriber, but because every message goes to every subscriber it cannot be used to distribute work across multiple worker processes. Kafka uses a partitioned log model to stitch together these two solutions. A log is an ordered sequence of records, and these logs are broken up into segments, or partitions, that correspond to different subscribers. This means that there can be multiple subscribers to the same topic and each is assigned a partition to allow for higher scalability. Finally, Kafka’s model provides replayability, which allows multiple independent applications reading from data streams to work independently at their own rate.

What are the benefits of Kafka's approach?

  1. Scalable: Kafka’s partitioned log model allows data to be distributed across multiple servers, making it scalable beyond what would fit on a single server.

  2. Fast: Kafka decouples data streams so there is very low latency, making it extremely fast.

  3. Durable: Partitions are distributed and replicated across many servers, and the data is all written to disk. This helps protect against server failure, making the data very fault-tolerant and durable.