YCSB on 9.1G Movie Data

English

1. Introduction

TerarkSQL is a MySQL distribution that using TerarkDB as its storage engine. We integrate TerarkDB into MySQL via MyRocks, which is a modification of MySQL that using RocksDB as its storage engine, powered by facebook.

2. Test Method

Test Tools
- YCSB(https://github.com/Terark/YCSB-mongo)
Test Dataset
- Since YCSB's default datasets are generated from random string, way too far from real-world scenarios. So we changed YCSB, make it use Amazon movie data (~8 million reviews) as the testing dataset.
Dataset Size
- About 9.1GB
- About 8 million records
- About 1KB per record
Storage Engines
- MySQL + InnoDB (InnoDB for short)
- MySQL on RocksDB，using RocksDB as storage engine（aka. MyRocks）, RocksDB for short
- TerarkSQL，using TerarkDB as storage engine (TerarkDB for short)
All reading tests are using uniform distribution and zipf distributon
We've caculated 95/99 percentile latency for all reading tests

2.1. Envrionment

CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz x2 （16 cores in total with 32 threads）
Memory: DDR4 16G @ 1866 MHz x4 （total 64G）
SSD: INTEL 730 IOPS 89000
OS: CentOS 7

3. Writing Performance

Writing Speed:
Writing 95/99 Percentile Latency:

4. Compression Ratio

Original dataset is about 9.1GB，8 million records，1KB for each record.

Disk Usage:

5. Reading Performance

5.1. Basic Settings

Memory limitation archived by cgroups
Client side application in all tests are running from separated machine in the same local network(Connected with 1000M switcher)
All reading tests use 16 threads, one connection per thread.
InnoDB uses default settings
- Since InnoDB uses file API directly for data reading, cgroups can't limit its memory usage, so we use system kernal settings to do that.
RocksDB enabled allow_mmap_reads so cgroups can limit its memory usage and we set BlockSize to 16k
TerarkDB uses default option settings（TerarkDB enable allow_mmap_reads by default）

All reading tests are using uniform distribution and zipf distribution

5.2. Data Much Smaller than Memory (Memory 64GB)

Physical memory is 64GB
RocksDB's block_cache_size was set to half of total memory(whici is its default behavior)
Reading 95/99 percentile latency was uniform distributon
Memory Usage

5.3. Data Smaller than Memory (Memory 8GB)

cgroups limits the total memory to 8G
InnoDB's test uses 8.6G memory since the OS uses about 0.6 GB
RocksDB's block_cache_size is set to 2G
TerarkDB only needs 3.02G memory, much lesser than 8G, so it will not be affected.
Read 95/99 percentile latency is under uniform distributon

5.4. Data Lager than Memory (Memory 4GB)

Memory was limited to 4G
InnoDB needs 4.6G memory since system will cost about 0.6GB
RocksDB's block_cache_size was set to 1G
TerarkDB only needs 3.02G，which is lesser than 4G and will not be affected
Reading 95/99 percentile latency is under uniform distribution

5.5. Data Much Larger than Memory (Memory 2GB)

Memory was limited to 2G
InnoDB uses about 2.6GB memory in total including 0.6GB of the system cost
RocksDB's block_cache_size is set to 500M
In this scenario, both RocksDB and InnoDB can't have enough memory, the bottleneck is disk IO and impacted the performance significantly. Since TerarkDB compressed data into about 2.5GB, so its memory inefficient is not so serious.
Reading 95/99 percentile latency is under uniform distribution for the test