Skip to content

YCSB on 9.1G Movie Data

rockeet edited this page Aug 23, 2018 · 26 revisions

中文版

English

Table of Contents

  • 1. Introduction
  • 2. Test Method
    • 2.1. Environment
  • 3. Writing Performance
  • 4. Compression
  • 5. Reading Performance
    • 5.1. Data Much Smaller than Memory (Memory 64GB)
    • 5.2. Data Smaller than Memory (Memory 8GB)
    • 5.3. Data Larger than Memory (Memory 4GB)
    • 5.4. Data Much Larger than Memory (Memory 2GB)

1. Introduction

TerarkSQL is a MySQL distribution that using TerarkDB as its storage engine. We integrate TerarkDB into MySQL via MyRocks, which is a modification of MySQL that using RocksDB as its storage engine, powered by facebook.

2. Test Method

  • Test Tools
  • Test Dataset
    • Since YCSB's default datasets are generated from random string, way too far from real-world scenarios. So we changed YCSB, make it use Amazon movie data (~8 million reviews) as the testing dataset.
  • Dataset Size
    • About 9.1GB
    • About 8 million records
    • About 1KB per record
  • Storage Engines
  • All reading tests are using uniform distribution and zipf distributon
  • We've caculated 95/99 percentile latency for all reading tests

2.1. Envrionment

  • CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz x2 (16 cores in total with 32 threads)
  • Memory: DDR4 16G @ 1866 MHz x4 (total 64G)
  • SSD: INTEL 730 IOPS 89000
  • OS: CentOS 7

3. Writing Performance

  • Writing Speed: Insert OPS

  • Writing 95/99 Percentile Latency: Insert Latency

4. Compression Ratio

Original dataset is about 9.1GB,8 million records,1KB for each record.

  • Disk Usage: Storage Size

5. Reading Performance

5.1. Basic Settings

  • Memory limitation archived by cgroups

  • Client side application in all tests are running from separated machine in the same local network(Connected with 1000M switcher)

  • All reading tests use 16 threads, one connection per thread.

  • InnoDB uses default settings

    • Since InnoDB uses file API directly for data reading, cgroups can't limit its memory usage, so we use system kernal settings to do that.
  • RocksDB enabled allow_mmap_reads so cgroups can limit its memory usage and we set BlockSize to 16k

  • TerarkDB uses default option settings(TerarkDB enable allow_mmap_reads by default)

All reading tests are using uniform distribution and zipf distribution

5.2. Data Much Smaller than Memory (Memory 64GB)

  • Physical memory is 64GB

  • RocksDB's block_cache_size was set to half of total memory(whici is its default behavior)

  • Reading 95/99 percentile latency was uniform distributon Read QPS Unlimited Read Latency Unlimited

  • Memory Usage Memory Usage

5.3. Data Smaller than Memory (Memory 8GB)

  • cgroups limits the total memory to 8G
  • InnoDB's test uses 8.6G memory since the OS uses about 0.6 GB
  • RocksDB's block_cache_size is set to 2G
  • TerarkDB only needs 3.02G memory, much lesser than 8G, so it will not be affected.
  • Read 95/99 percentile latency is under uniform distributon Read QPS 8G Read Latency 8G

5.4. Data Lager than Memory (Memory 4GB)

  • Memory was limited to 4G
  • InnoDB needs 4.6G memory since system will cost about 0.6GB
  • RocksDB's block_cache_size was set to 1G
  • TerarkDB only needs 3.02G,which is lesser than 4G and will not be affected
  • Reading 95/99 percentile latency is under uniform distribution Read QPS 4G Read Latency 4G

5.5. Data Much Larger than Memory (Memory 2GB)

  • Memory was limited to 2G
  • InnoDB uses about 2.6GB memory in total including 0.6GB of the system cost
  • RocksDB's block_cache_size is set to 500M
  • In this scenario, both RocksDB and InnoDB can't have enough memory, the bottleneck is disk IO and impacted the performance significantly. Since TerarkDB compressed data into about 2.5GB, so its memory inefficient is not so serious.
  • Reading 95/99 percentile latency is under uniform distribution for the test Read QPS 2G Read Latency 2G