
What: A data structure and storage pattern that combines in-memory tables (Memtables) with immutable sorted string tables (SSTables) on disk. SSTable is a file containing sorted key-value pairs, while Memtable is its in-memory counterpart. Together, they form the foundation of Log-Structured Merge Trees (LSM Trees) used in modern databases.
A SSTable then is exactly what it sounds like, it is a file which contains a set of arbitrary, sorted key-value pairs inside. It’s a perfect candidate for streaming processing: read stream from disk, process key by key, and write output back to disk. Combining it with a static index that could be loaded into memory, and we’re one disk seek away from the key. So SSTable solves the problem of fast reads from disks by key.
We could have a similar structure (possibly a different one) but in memory to solve for fast random writes. This in-memory table we will call Memtable.
The following describes the interactions between those components (kudos to this article):
On-disk SSTable indexes are always loaded into memory
All writes go directly to the MemTable index
Reads check the MemTable first and then the SSTable indexes
Periodically, the MemTable is flushed to disk as an SSTable
Periodically, on-disk SSTables are merged with a merge sort
Use cases:
Processing large sequential read/write workloads
Database engines (LevelDB, RocksDB)
Storage engines in distributed databases (Cassandra, HBase)
Embedded storage systems (IndexDB in WebKit)
Processing gigabyte to terabyte-sized datasets
Reliability:
Immutable SSTables ensure data integrity
Periodic flushing of Memtable prevents data loss
Sequential writes reduce chances of corruption
Tombstone records handle deletes safely
Scalability:
Handles datasets from megabytes to petabytes
Efficient merging of multiple SSTables
Support for range-based sharding
Compact storage through optional compression
Maintainability:
Simple key-value interface
Clear separation of hot/cold data
Built-in garbage collection through compaction
Performance:
From RocksDB benchmarks:
150k random read operations per second
10^6 write operations per second
Users:
Google (BigTable)
Meta (RocksDB)
Apache Cassandra
Apache HBase
LevelDB
Resources:
A large portion of this article is borrowed from https://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/
Tags: #kv #storage
Disclaimer: Found a mistake or have a different perspective? Please drop a comment! I aim to keep this content accurate and valuable for everyone.