6. Partitioning

For large datasets, we need to break up the data into partitions → called sharding
We want to achieve scalability. Place different partitions on different nodes.
Partitioning and Replication
- They are usually used together so that copies of each partition are stored on multiple nodes
- A node may store more than one partition.
- Everything mentioned in the previous section 5 regarding replication of DBs is the same for partitions.

Partitioning of Key-Value Data
- Partition by key-range → Assign a continuous range of keys to each partition, like volumes of a paper encyclopedia. Here, the data may not be evenly distributed if one word, e.g., I, has more data than others. Partitions can be set manually by an admin or automatically.
- Partitioning by hash of key → Use a hash function, which returns a random number. When we find suitable hash functions for keys, we can assign each partition a range of hashes, and every key whose hash falls within a partition's range will be stored in that partition.
  - Consistent Hashing is a way of evenly distributing load across an internet-wide system of caches such as CDNs.
Partitioning and Secondary Indexes
- Two ways of partitioning the database with secondary indexes
  - Document-based → In this approach, each partition is a separate entity. Each partition maintains its own secondary index, covering only the documents within that partition (referred to as the local index).
    - The problem is that not all stuff with the same index would be on the same node (e.g, all red cars)
    - So we need to send a query to all participants and combine results (scatter/gather)
    - MongoDB, Riak, and Cassandra use it.
    - Most DB vendors recommend that secondary index queries can be served from a single partition.
  - Term-based → Construct a global index that covers data in all partitions. Then this index is partitioned,
    - The advantage is that it can make reads more efficient, as the client needs to make a request.
    - Downside is slower writes as it needs to write a single doc on multiple partitions
Rebalancing Partitions → If query throughput increases or the dataset size increases, we need to rebalance the data on the nodes.
- Strategies:
  - hash mod N (needs rebalancing when adding new nodes),
  - Fixed number of partitions (good solution → create many more partitions than there are nodes and assign several partitions to each node), and only entire partitions are moved between nodes.
  - Dynamic partitioning. For databases that use key-range partitioning, a fixed number of partitions is a limitation. Examples are HBase and RethinkDB. Each partition is assigned to one node, and each node can handle multiple partitions. When a partition grows, it is split into two, and vice versa.
  - Partitioning proportionally to nodes. Cassandra uses this option and Ketama is to make the number of partitions proportional to the number of nodes, i.e, a fixed number of partitions. Here, the size of each partition grows proportionally to the dataset size, while the number of nodes remains unchanged.
Request Routing
- How does the client determine which node to connect to when there are partitions?
- We need service discovery
- We can use a separate coordination service such as ZooKeeper.
- Cassandra uses gossip protocol among nodes to check any changes in the cluster state.

My note: This book is not about distributed systems; it is 80% about building and understanding database concepts and 20% about distributed systems.