This repository has been archived by the owner on Oct 28, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 163
nessDB v2.0(in Chinese)
shuttler edited this page Feb 19, 2013
·
2 revisions
2.0版本使用Small-Splittable Tree(SST)作为核心存储结构,随机写/读性能又进一步提高。
索引文件使用SST结构,数据文件是AOF(但删除的slot会被重用,请参考compact.c文件)
当Value长度超过NESSDB_COMPRESS_LIMIT(默认1KB),则进行QuickLZ压缩。
1)Tree的高度最大是H,满了就进行自我细胞分裂(份数可配置,默认是4份),目的是避免底层Level过大
2)Lazy式插入,每个Level-N都是Level-(N+1)的缓冲区,目的是提高System buffer/cache利用率
3)Fractional cascading加速
4)是Cache-oblivious write-optimized structure,Bε-tree
+------------+
| Header |
+------------+
| L0(random) |
+------------+----------+
| L1(sorted) |
+------------+----------+----------+----------+
| L2(sorted) |
+------------+----------+----------+----------+
说明:
1)Level(N)的大小 = BASE^N
2)Header里包含:当前SST的记录数,Bloom filter信息
3)L0是AOF,其它Level数据均有序
4)每次数据先AOF到L0的槽位,当L0满了就推到L1,L1满了就推到L2,依次下去(中间会有数据Merge过程)
5)所有Level均满了,就进行自我分裂,数据均分
+-------------------------------+
------------------------------->| write to TOWER(SST) |
+-------------------------------+
|
| if active is full(one TOWER becomes immutable)
v
+-------------------------------+
| create one thread to do merge |
+-------------------------------+
|
|
v
+-------------------------------+
| exit the merging thread |
+-------------------------------+
+----------------------------+
+----->|lookup from active TOWER(SST)|
^ +-----------+----------------+
| |
| exists | not exists
| +------+------------------------+
| | |
| v |
| +------------------------+ |
| | read data from db file | |
| +------------------------+ |
| v
|<-------------------------------------------+------------------+
| |
| +---------------v--------------------+
| | get index file info from meta list |
| +---------------+--------------------+
| |
| |
| +---------------v--------------------+
| | a)read data offset from index file |
| +---------------+--------------------+
| |
| |
| +---------------v--------------------+
| | b)read data length from index file |
| +---------------+--------------------+
| |
| |
| +---------------v--------------------+
| | c)read data from db file |
| +---------------+--------------------+
| |
\---------------------------------------------------------------/
@2012.12