LevelDB

Table of Contents

Overview

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

Durability

Sanjay commented on HN about the durability of LevelDB. And what exactly it means that writes may be lost (for asynchronous writes), namely that the database is not corrupted following a crash, but merely result in incomplete data files:

Merely an incomplete one. Leveldb never writes in place: it always appends to a log file, or merges existing files together to produce new ones. So an OS crash will cause a partially written log record (or a few partially written log records). Leveldb recovery code uses checksums to detect this and will skip the incomplete records.

LSM-Tree

An LSM-Tree is used to increase write-throughput. This makes LevelDB more suitable for random writes-heavy workloads. In contrast KyotoCabinet implements ordered keys using a B-Tree, which is more suitable for read-heavy workloads.

Features

  • Keys and values are arbitrary byte arrays.
  • Data is stored sorted by key.
  • Callers can provide a custom comparison function to override the sort order.
  • The basic operations are Put(key,value), Get(key), Delete(key).
  • Multiple changes can be made in one atomic batch.
  • Users can create a transient snapshot to get a consistent view of data.
  • Forward and backward iteration is supported over the data.
  • Data is automatically compressed using the Snappy compression library.
  • External activity (file system operations etc.) is relayed through a virtual interface so users can customize the operating system interactions.

Install on Ubuntu

  1. Install git and snappy (don’t neccessarily need snappy as LevelDB will work without it but you would need to recompile if you don’t install it before compiling)
    sudo apt-get install git-core libsnappy-dev
    
  2. Clone
    git clone  https://github.com/google/leveldb.git
    
  3. Compile
    cd leveldb
    make
    
  4. Install
    cd out-shared
    sudo cp --preserve=links libleveldb.* /usr/local/lib
    cd ../include
    sudo cp -R leveldb /usr/local/include/
    sudo ldconfig
    

Example

Write and Read back.

#include <cassert>
#include <leveldb/db.h>
#include <iostream>
#include <string>
using namespace std;

int main() {
  leveldb::DB* db;
  leveldb::Options options;
  options.create_if_missing = true;
  leveldb::Status status = leveldb::DB::Open(options, "/tmp/testdb", &db);
  assert(status.ok());
  string key = "foo";
  string value = "bar";
  cout << "write Key:" << key << " and value:" << value << endl;
  status = db->Put(leveldb::WriteOptions(), key, value);
  assert(status.ok());
  string value_back;
  status = db->Get(leveldb::ReadOptions(), key, &value_back);
  cout << "Read back:" << value_back << endl;
  assert(status.ok());
  assert(value == value_back);
  delete db;
  return 0;
}
g++ -o test test.cpp  -lleveldb

Deeper into LevelDB

Performance

Comparison to others

Sanjay Ghemawat talked on YCombinator/HackerNews about the difference between Leveldb and BitCask:

  1. LevelDB is a persistent ordered map; bitcask is a persistent hash table (no ordered iteration).
  2. Bitcask stores a fixed size record in memory for every key. So for databases with large number of keys, it may use too much memory for some applications. bitcask can guarantee at most one disk seek per lookup I think. leveldb may have to do a small handful of disk seeks.
  3. To clarify, leveldb stores data in a sequence of levels. Each level stores approximately ten times as much data as the level before it. A read needs one disk seek per level. So if 10% of the db fits in memory, leveldb will need to do one seek (for the last level since all of the earlier levels should end up cached in the OS buffer cache). If 1% fits in memory, leveldb will need two seeks.

Comparison to TokyoCabinet

LevelDB is optimized for writes, and TokyoCabinet for reads. Comments by Sanjay:

TokyoCabinet is something we seriously considered using instead of writing leveldb. TokyoCabinet has great performance usually. I haven’t done a careful head-to-head comparison, but it wouldn’t surprise me if it was somewhat faster than leveldb for many workloads. Plus TokyoCabinet is more mature, has matching server code etc. and may therefore be a better fit for many projects.

However because of a fundamental difference in data structures (TokyoCabinet uses btrees for ordered storage; leveldb uses log structured merge trees), random write performance (which is important for our needs) is significantly better in leveldb. This part we did measure. IIRC, we could fill TokyoCabinet with a million 100-byte writes in less than two seconds if writing sequentially, but the time ballooned to ~2000 seconds if we wrote randomly. The corresponding slowdown for leveldb is from ~1.5 seconds (sequential) to ~2.5 seconds (random).

More

  • SSDB A fast NoSQL database, an alternative to Redis, the Redis protocol wrapper of LevelDB.
  • Cache-oblivious B-trees are LSM-trees with faster searches.

cc



Author: Shi Shougang

Created: 2017-02-06 Mon 23:26

Emacs 24.3.1 (Org mode 8.2.10)

Validate