System Design - data-engineering
#01 Data Engineering

Build a Podcast Transcription and Chapter Detection Pipeline Premium

Design a pipeline that transcribes uploaded audio in near real-time, identifies natural topic boundaries to auto-generate chapter markers, and makes the full transcript searchable within minutes of upload.

Read
#02 Scalability

Build a QR Code Generation and Analytics Service Free

Design a system that generates dynamic QR codes at scale, tracks every scan with device and location metadata, and lets users update the destination URL without regenerating the QR code.

Read
#03 Data Engineering

Build a RAG Pipeline for an Enterprise Knowledge Base Premium

Design a Retrieval-Augmented Generation pipeline that ingests internal documents, chunks and embeds them, retrieves the most relevant context for each query, and re-ranks results before passing them to the LLM.

Read
#04 Data Engineering

Build a Real-Time Analytics Dashboard at 100B Events Premium

Design a product analytics system that ingests billions of events per day, supports sub-second aggregate queries over arbitrary time windows, and powers live dashboards without precomputing every possible metric.

Read
#05 Observability

Build a Real-Time Log Aggregation Pipeline Premium

Design a log ingestion and querying system that handles 1 million events per second, supports full-text search with sub-second latency, and retains 90 days of logs without breaking the budget.

Read
#06 Distributed Systems

Build a Schema Registry for Event-Driven Systems Premium

Design a schema registry that enforces backward and forward compatibility for Kafka topics, prevents breaking changes from reaching production, and lets consumers evolve independently of producers.

Read
#07 Databases

Build a Semantic Search Engine with Hybrid Ranking Premium

Design a search engine that combines keyword-based BM25 scoring with dense vector similarity to handle queries where exact keyword matches are absent but semantic intent is clear.

Read
#08 Data Engineering

Build Spotify's Song Radio Feature Premium

Design a real-time playlist generator that uses a seed song to produce an infinite radio stream of acoustically and contextually similar tracks, personalizes the mix per listener, and avoids repetition across sessions.

Read
#09 Databases

Build a Time-Series Metrics Store Premium

Design a metrics storage engine that compresses billions of data points efficiently, supports fast range queries for dashboard rendering, and retains high-resolution data for 15 days before downsampling.

Read
#10 Scalability

Build Twitter's Trending Topics Pipeline Premium

Design a system that detects emerging viral signals from a stream of 500,000 tweets per second, computes trending topics per region, and refreshes trends every 60 seconds without reprocessing the full history.

Read