Skip to content
This repository has been archived by the owner on Oct 28, 2023. It is now read-only.

nessDB v2.0(in Chinese)

shuttler edited this page Feb 19, 2013 · 2 revisions

nessDB 2.0

2.0版本使用Small-Splittable Tree(SST)作为核心存储结构,随机写/读性能又进一步提高。
索引文件使用SST结构,数据文件是AOF(但删除的slot会被重用,请参考compact.c文件)
当Value长度超过NESSDB_COMPRESS_LIMIT(默认1KB),则进行QuickLZ压缩。

SST特点

1)Tree的高度最大是H,满了就进行自我细胞分裂(份数可配置,默认是4份),目的是避免底层Level过大
2)Lazy式插入,每个Level-N都是Level-(N+1)的缓冲区,目的是提高System buffer/cache利用率
3)Fractional cascading加速
4)是Cache-oblivious write-optimized structure,Bε-tree

SST结构

+------------+
|   Header   |
+------------+
| L0(random) |
+------------+----------+
| L1(sorted)            |
+------------+----------+----------+----------+
| L2(sorted)                                  |
+------------+----------+----------+----------+

说明:

1)Level(N)的大小 = BASE^N
2)Header里包含:当前SST的记录数,Bloom filter信息
3)L0是AOF,其它Level数据均有序
4)每次数据先AOF到L0的槽位,当L0满了就推到L1,L1满了就推到L2,依次下去(中间会有数据Merge过程)
5)所有Level均满了,就进行自我分裂,数据均分

写流程

                                +-------------------------------+  
------------------------------->|      write to TOWER(SST)      |
                                +-------------------------------+  
	                                         |
	                                         | if active is full(one TOWER becomes immutable) 
	                                         v
        	                   +-------------------------------+
                	           | create one thread to do merge |
                        	   +-------------------------------+
                                	         |
	                                         |
	                                         v
        	                   +-------------------------------+
                	           |      exit the merging thread  |
                        	   +-------------------------------+

读流程

        +----------------------------+
+----->|lookup from active TOWER(SST)|
^       +-----------+----------------+
|                   |
|            exists | not exists
|            +------+------------------------+
|            |                               |
|            v                               |
| +------------------------+                 |
| | read data from db file |                 |
| +------------------------+                 |
|                                            v
|<-------------------------------------------+------------------+
|                                                               |
|                                               +---------------v--------------------+
|                                               | get index file info from meta list |
|                                               +---------------+--------------------+
|                                                               |
|                                                               |
|                                               +---------------v--------------------+
|                                               | a)read data offset from index file |
|                                               +---------------+--------------------+
|                                                               |
|                                                               |
|                                               +---------------v--------------------+
|                                               | b)read data length from index file |
|                                               +---------------+--------------------+
|                                                               |
|                                                               |
|                                               +---------------v--------------------+
|                                               |     c)read data from db file       |
|                                               +---------------+--------------------+
|                                                               |
\---------------------------------------------------------------/

@2012.12

Clone this wiki locally