Designing Data-Intensive Applications by Kleppmann — Review

Martin Kleppmann's Designing Data-Intensive Applications has become, in under a decade, the default reference for distributed-systems engineering interviews and the backbone of many senior-engineer reading lists. It has earned that status.

What The Book Covers

Part 1: Foundations of Data Systems — storage engines, data models, encoding formats
Part 2: Distributed Data — replication, partitioning, transactions, consistency
Part 3: Derived Data — batch processing, stream processing, building data pipelines

Each chapter is 30-50 pages, densely packed, with concrete examples drawn from real systems (PostgreSQL, Cassandra, Kafka, Spark, Riak, etcd, ZooKeeper).

Strongest Chapters

Chapter 5 (Replication). The clearest single-chapter treatment of master-slave replication, multi-master replication, leaderless replication (Dynamo-style), and the CAP tradeoffs that underlie each. Every backend engineer should read this chapter once.

Chapter 7 (Transactions). Isolation levels (read committed, snapshot isolation, serializable), weak isolation anomalies (dirty writes, lost updates, write skew, phantom writes), and what each database actually does. After this chapter, you'll finally understand why your database's default isolation level isn't serializable.

Chapter 9 (Consistency and Consensus). Linearizability, causal consistency, ordering guarantees, and consensus (Paxos, Raft). Accessible treatment of material that is usually inscrutably academic.

Chapter 11 (Stream Processing). Modern streaming (Kafka Streams, Flink, Samza), event sourcing, and exactly-once semantics. The book predated some newer streaming patterns but the fundamentals remain correct.

What It's Not

A tool-specific manual. Don't expect PostgreSQL query-tuning specifics or Redis cluster config details. The book is about how to think about data systems, not how to use any specific tool.

Cloud-native specific. The book predates Snowflake, ClickHouse, DuckDB, and much of the modern data warehouse era. The principles apply, but the specific tools in examples are 2010s-era.

Who Should Read

Every backend engineer within their first 3 years. Architects designing distributed systems. Anyone interviewing for senior engineer roles at systems-heavy companies.

Who Should Skip

Front-end engineers who don't touch backend at all. Pre-algebra-level beginners — the book assumes CS fundamentals.

Reading Strategy

550 pages is a commitment. Recommended approach:

Read chapters 5, 7, 9 first — they're the interview/production hits
Come back for chapters 2, 3, 4 for storage and data model depth
Chapters 10-12 on batch/streaming for data engineers specifically

Verdict

Essential for backend and systems engineers. The second edition (if ever released) would be even better, but the current book remains 95% relevant and 100% worth the read.

Designing Data-Intensive Applications by Martin Kleppmann Review