Why are keys overlap in SST files at Level 0 in LevelDB system?

As is known, keys may be overlap in SST files of level 0 in LevelDB. I am wondering why it needs to be overlap?

asked Dec 14, 2015 by harryxiyou (5,830 points)

1 Answer

An sstable is read-only after being written to the disk from a memtable. Accumulated K/Vs in the log up to a certain size are organized as a memtable and written to the disk as an sstable in level 0.

K/Vs coming in to the log file are not sorted. Hence, a later written sstable in level 0 may have "smaller" keys than keys in previous sstables in level 0.

Example (keys):

write/updates:        2, 9, 1,   3, 9, 8
sstables in level 0: {1, 2, 9}, {3, 8, 9}

This makes sstables in level 0 written to the disk be "like" a log (append only) which is good for write performance to disks. From level 1, overlappings are avoided through the process of merging sstables.

answered Dec 15, 2015 by Eric Z Ma (44,280 points)

What do you mean by "Accumulated K/Vs in the log up to a certain size are organized as a memtable and written to the disk as an sstable in level 0."?

When one KV record is coming for writing, LevelDB would firstly append commit log to Log File (not sorted) and then insert this KV record to Memtable (SkipList Insert Operation). When Memtable (In sorted orders) achieves the threshold value, it would become the Immutable Table (sorted and read only). At last, Immutable Table would be dumped into disk as SSTables. Therefore, KV records in Level0 SSTables are coming from Immutable Table.

It seems that you mean SSTables in level0 are coming from the Log File. I am not very clear about your point. Would you please clarify this.

commented Dec 15, 2015 by harryxiyou (5,830 points)

It seems that there are some overlaps in Immutable Memtable so it would be flushed into SSTables files in level 0 (Level 0 is not made by compaction). However, other levels (except level 0) would be compacted and merged so there are no overlaps in other levels.

commented Dec 15, 2015 by harryxiyou (5,830 points)

The "accumulated K/Vs in the log" to be written to a sstable is the same set of K/Vs as in the corresponding memtable (turned to be the immutable table).

commented Dec 15, 2015 by Eric Z Ma (44,280 points)

However, the minor compaction is from Immutable Memtable (records have already been sorted) to SSTables in Level0. There is no merge or delete in this process (Immutable Memtable to SSTable files in Level 0).

It seems like this.

1, write/updates: 2, 9, 1, 3, 9, 8
2, Immutable Memtable: {1, 2, 3, 8, 9, 9}
3, sstable file 1 in level 0: {1, 2, 3, 8, 9, 9}

1, write/updates: 2, 90, 1, 30, 90, 8
2, Immutable Memtable: {1, 2, 8, 30, 90, 90}
3, sstable file 2 in level 0: {1, 2, 8, 30, 90, 90}

commented Dec 15, 2015 by harryxiyou (5,830 points)

The immutable memtable should be identical to the sstable, or sstable is just a dump of an immutable memtable to disk.

Merges are done from level 0 to level 1:

sstable file 1 in level 0: {1, 2, 3, 8, 9, 9}
sstable file 2 in level 0: {1, 2, 8, 30, 90, 90}
sstable file 3 in level 1: {1, 90}

will be merged to (assume no other sstables to be merged)

sstable file 4 in level 1: {1, 2, 3, 8, 9, 30, 90} (assume no deletion). In level 1, there are no other sstables covering key range [1, 90].

commented Dec 15, 2015 by Eric Z Ma (44,280 points)

ACK. for your last comments. Thank you Zhiqiang ;-)

commented Dec 15, 2015 by harryxiyou (5,830 points)

Please log in or register to answer this question.

Copyright © SysTutorials. User contributions licensed under cc-wiki with attribution required.
Hosted on Dreamhost

...