Skip to content

Commit

Permalink
Partioning Vs Sharding
Browse files Browse the repository at this point in the history
  • Loading branch information
vidyabhandary committed May 25, 2024
1 parent 93828f3 commit 20859eb
Showing 1 changed file with 32 additions and 0 deletions.
32 changes: 32 additions & 0 deletions misc/PartioningVsSharding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Partitioning Vs Sharding

Both sharding and partitioning are techniques used to manage large databases, but they differ in how they distribute the data:

**Sharding**

- **Distribution:** Sharding splits the data horizontally across **multiple servers or nodes**. Each shard is a complete and independent subset of the data, containing its own copy of the table schema.
- **Scalability:** Sharding excels at horizontal scaling. As your data grows, you can simply add more servers to distribute the load.
- **Complexity:** Sharding introduces complexity in managing a distributed system. You need to handle routing queries to the appropriate shard and ensure data consistency across all shards.
- **Example:** Imagine a social media platform with sharded user data. Users from North America might be stored on one shard, while users from Europe reside on another.

**Partitioning**

- **Distribution:** Partitioning divides a **single table** horizontally within the same database server. Partitions are essentially sub-tables that hold specific subsets of the data based on a chosen criteria.
- **Performance:** Partitioning improves query performance by allowing you to quickly locate relevant data. Queries can target specific partitions, reducing the amount of data scanned.
- **Management:** Partitioning is easier to manage compared to sharding as everything remains within a single server.
- **Example:** An e-commerce website might partition its order table by year. Queries for past orders can then be directed to the appropriate year partition.

**Here's a table summarizing the key differences:**

| Feature | Sharding | Partitioning |
| ------------ | -------------------------------------------------------- | ---------------------------------------- |
| Distribution | Across multiple servers | Within a single server |
| Scalability | Excellent horizontal scaling | Limited by server capacity |
| Complexity | More complex (distributed system management) | Simpler management |
| Performance | Improved due to parallel processing | Improved for focused queries |
| Consistency | Maintaining consistency across shards can be challenging | Consistency is generally straightforward |

**In conclusion:**

- Use sharding for massive datasets requiring horizontal scalability and potentially high write volume.
- Use partitioning for improved query performance on large tables within a single server, especially when queries target specific subsets of data.

0 comments on commit 20859eb

Please sign in to comment.