Chapter 1. Reliable, Scalable, Maintainable Systems

  1. What makes an application “data-intensive”?

    Its dominant bottlenecks are the volume, velocity, or variety of data, storage layout, indexes, and replication, which matter more than raw CPU cycles.

  2. Name DDIA’s three evergreen system goals.

    Reliability (it continues to function correctly), scalability (it scales as the load increases), and maintainability (it remains simple enough to evolve).

  3. Fault vs. failure?

    A fault is one component misbehaving; a failure is a user-visible service outage triggered by one or more faults.

  4. Give two standard load parameters.

    Typical load parameters include requests per second, read-write ratio, concurrent sessions, payload size, CPU utilization, memory usage, disk I/O, and network bandwidth.

  5. Which two axes can you scale on?

    Vertical (bigger boxes) and horizontal (more boxes, partition or replicate workload).

  6. Why is tail latency tracked with percentiles?

    Averages hide outliers; p95 or p99 shows how “bad” the slowest 5% or 1% of requests are.

  7. Maintainability rests on what three properties?

    Operability (easy to run), simplicity (easy to reason about), evolvability (easy to change).

  8. How does redundancy increase reliability?

    Extra replicas turn single-component faults into non-events for users as traffic is routed around failures.

  9. Why do graceful degradation strategies pay off?

    Serving partial answers beats total outage, e.g., stale cache reads if the database is struggling.

  10. Which metrics describe system performance, not load?

    Latency (often expressed as p50, p95, or p99), throughput, and the resource growth required to maintain them steady as the load rises.


Chapter 2. Data Models & Query Languages

  1. List the three mainstream data models and a sample engine for each.

    Relational / PostgreSQL, document / MongoDB, graph / Neo4j.

  2. What is normalization in one line?

    Split tables to remove duplication and enforce integrity via joins.

  3. When does denormalization help?

    High-read, low-write workloads where pre-joined documents cut response time and disk seeks.

  4. Schema-on-write vs. schema-on-read, core trade-off?

    Enforcing upfront guarantees integrity; deferring to read time maximizes flexibility but shifts checks to every consumer.

  5. How can the “impedance mismatch” bite ORMs?

    Objects with optional nested fields map awkwardly onto rigid relational tables, creating n + 1 query patterns.

  6. Why would a team mix data models in one product?

    JSON docs for user profiles, relational tables for billing, and a graph for social links each optimize a different query pattern.