Partitioning of Key-Value Data
Partitioning and Secondary Indexes
Two ways of partitioning the database with secondary indexes
Document-based → In this approach, each partition is a separate entity. Each partition maintains its own secondary index, covering only the documents within that partition (referred to as the local index).
The problem is that not all stuff with the same index would be on the same node (e.g, all red cars)
So we need to send a query to all participants and combine results (scatter/gather)
MongoDB, Riak, and Cassandra use it.
Most DB vendors recommend that secondary index queries can be served from a single partition.
Term-based → Construct a global index that covers data in all partitions. Then this index is partitioned,
Rebalancing Partitions → If query throughput increases or the dataset size increases, we need to rebalance the data on the nodes.
Request Routing
My note: This book is not about distributed systems; it is 80% about building and understanding database concepts and 20% about distributed systems.