Uber | System Design Round | L5 | Design Kafka
Summary
I recently interviewed at Uber for an L5 System Design role, where I was asked to design a distributed message broker similar to Apache Kafka. Although I felt the discussion went very well, covering deep Kafka internals, I received frustrating feedback, suggesting I might only be eligible for an L4 position due to a perceived lack of clarifying questions.
Full Experience
Recently went through a system design round at Uber where the prompt was: "Design a distributed message broker similar to Apache Kafka." The requirements focused on topic-based pub/sub, partitioned ordered storage, durability, consumer groups with parallel consumption, and at-least-once delivery. I thought the discussion went really well—covered a ton of depth, including real Kafka internals and evolutions—but ended up with some frustrating feedback.
Requirements Clarification
Functional: Topics, publish/subscribe, ordered messages per partition, consumer groups for parallel processing, at-least-once guarantees via consumer acks.
Non-functional: High throughput/low latency, durability (persistence to disk), scalability, fault tolerance.
Probed on push vs. pull model → settled on pull-based (consumer polls) for better consumer pacing and backpressure handling.High-Level Architecture
Core Components:
Brokers clustered for scalability.
Topics → Partitions → Replicas (primary + secondaries for fault tolerance).
Producers publish to topics (key-based partitioning for ordering).
Consumers in groups, with one-to-many consumer-to-partition mapping for parallelism.
Coordination: Initially Zookeeper for metadata, leader election, and consumer offsets—but explicitly discussed evolution to KRaft (quorum-based controller, no external dependency) as a more modern, ops-friendly direction.
Optional Frontend Layer: Introduced a lightweight proxy layer for "dumb" clients (handles routing, auth, rate-limiting). Smart clients bypass it and talk directly to brokers after fetching metadata.Deep Dives & Trade-offs
This is where I went deep:
Storage & Durability:
Write-ahead log style: Messages appended to partition segments on disk.
Page cache leverage for fast reads.
In-sync replicas (ISR) concept: Leader waits for ack from ISR before committing.
Replication & Failure Handling:
Primary host per partition, secondaries for redundancy.
Mix of sync (for durability) and async (for latency) replication.
Leader election via ZAB (Zookeeper Atomic Broadcast) for strong consistency and quorum handling during network partitions or broker failures.
Producer Side:
Serialized operations at partition level for ordering.
Key-based partitioning.
Consumer Side:
Poll + explicit ack for at-least-once guarantees.
Offset tracking per consumer group/partition.
Parallel consumption within groups.
Rebalancing & Assignment:
Partition assignment: Round-robin or resource-aware, ensuring replicas not co-located.
Coordination: Used a flag (e.g., in Redis or metadata store) to pause consumers during rebalance—simple and safe to avoid message loss/duplication. Discussed how this evolves toward Zookeeper based rebalancing in mature systems.
Scalability Topics:
Adding/removing brokers: Reassign partitions via controller.Other Advanced Points
Explicitly highlighted Kafka's real evolution: From heavy Zookeeper dependency → KRaft for self-managed quorum
Overall, I felt that the interview went quite well and was expecting Hire at least from the round. Considering other rounds were also positive only I felt that I had more than 50% chance of being selected. However, to my horror I was told that I might only be eligible for L4 as there were callouts in relation to not asking enough clarifying questions. Since LLD, DSA and Managerial rounds went well and this problem itself was not very vague I can't seem to figure out what went wrong. My guess is that there are too many candidates so they end up finding weird reasons to reject candidates. To top it all, they rescheduled my interviews like 5-6 times and I had to keep on brushing up my concepts
Interview Questions (1)
Design a Distributed Message Broker (Kafka-like)
Design a distributed message broker similar to Apache Kafka. The requirements focused on topic-based pub/sub, partitioned ordered storage, durability, consumer groups with parallel consumption, and at-least-once delivery.