Author: | Shen, Zhaoyan |
Title: | Optimizing flash-based key-value database engine for big data and mobile applications |
Advisors: | Shao, Zili (COMP) |
Degree: | Ph.D. |
Year: | 2018 |
Subject: | Hong Kong Polytechnic University -- Dissertations Computer storage devices |
Department: | Department of Computing |
Pages: | xvii, 129 pages : color illustrations |
Language: | English |
Abstract: | The key-value database engine, which offers higher efficiency, scalability, availability, and usually works with simple NoSQL schema, is becoming more and more popular. It has been widely adopted as the caching system in today's low-latency Internet services, such as Memcached, Redis, McDipper, and Fatcache. However, these conventional key-value cache systems are either heavily reliant on expensive DRAM memory or utilize commercial solid state drives (SSDs) in an inefficient way. In addition, although the key-value database engine has simple interfaces and has been proven to be more profitable than the traditional relational SQL databases in cloud environments, it has seldom been adopted by mobile applications. The reason for this is that most applications running on mobile devices depend on the SQL interface to access databases, which the key-value database engine does not provide. In this thesis, we address these issues from several aspects including the integration of the emerging hardware open-channel SSD, the cross-layer hardware/software management, and the design of an SQLite-to-KV compiler for mobile applications. First, we focus on optimizing the key-value caching performance through a deep integration of flash hardware devices and key-value software management. To lower the Total Cost of Ownership (TCO), the industry has recently been moving toward more cost-efficient flash-based solutions, such as Facebook's McDipper and Twitter's Fatcache. These cache systems typically take commercial SSDs and adopt a Memcached-like scheme to store and manage key-value cache data in flash. Such a practice, although simple, is inefficient because of the huge semantic gap between the key-value cache manager and the underlying flash devices. In this thesis, we advocate reconsidering the design of the cache system and directly opening device-level details of the underlying flash storage for key-value caching. We propose an enhanced flash-aware key-value cache manager, consisting of a novel unified address mapping module, an integrated garbage collection policy, a dynamic over-provisioning space management, and a customized wear-leveling policy, to directly drive the flash management. A thin intermediate library layer provides a slab-based abstraction of low-level flash memory space and an API interface for directly and easily operating flash devices. A special flash memory SSD hardware that exposes flash physical details is adopted to store key-value items. This codesign approach bridges the semantic gap and well connects the two layers, allowing us to leverage both the domain knowledge of the key-value caches and the unique properties of the device. In this way, we can maximize the efficiency of key-value caching on flash devices while minimizing the weakness. We implemented a prototype, called DIDA-Cache, based on the open-channel SSD platform. Our experiments on real hardware show that we can significantly increase the throughput by 35.5%, reduce the latency by 23.6%, and decrease unnecessary erase operations by 28%. Second, we propose a new programming storage interface for SSDs to provide flexible support for key-value caching. Solid-state drives (SSDs) are widely deployed in computer systems of numerous types and purposes, in two main usage modes. In the first mode, the SSD firmware hides the details of the hardware from the application and exports the standard, backward-compatible block I/O interface. This ease of use comes at the cost of low resource utilization, due to the semantic gap between application and hardware. In the second mode, the SSD directly exposes the low-level details of the hardware to developers, who leverage them for fine-grained application-specific optimizations. However, the improved performance significantly increases the complexity of the software and also the cost of developing it. Thus, application developers must choose between easy development and optimal performance, without a real possibility of being able to balance the two. To address this limitation, we propose Prism-SSD—a flexible storage interface for SSDs. Via a user-level library, Prism-SSD exports the SSD hardware in three levels of abstraction: as a raw flash medium with its low-level details, as a group of functions to manage flash capacity, and simply as a configurable block device. This multi-level abstraction allows developers to choose the degree to which they want to control the flash hardware so that it best suits the semantics and performance objectives of their applications. To demonstrate the usability and performance of this new model and interface, we implemented a user-level library on the open-channel SSD platform to the prototype Prism-SSD. We implemented three versions of the key-value caching system by using each of the library's three levels of abstraction, and compared their performances and development overhead. Third, we study the problem of making mobile applications benefit the efficient key-value database engine. SQLite has been deployed in millions of mobile devices from web to smartphone applications on various mobile operating systems. However, due to the uncoordinated nature of the IO interactions with the underlying file system (e.g., ext4), SQLite is not efficient, with a low number of transactions per second. In this thesis, we for the first time propose a new SQLite-like database engine, called SQLiteKV, which adopts the LSM-tree-based data structure but retains the SQLite operation interfaces. With its SQLite interface, SQLiteKV can be utilized by existing applications without any modifications, while providing high performance with its LSM-tree-based data structure. We separate SQLiteKV into front-end and back-end sections. In the front-end, we develop a light-weight SQLite-to-KV compiler to solve the semantic mismatch, so that SQL statements can be efficiently translated into key-value operations. We also design a novel coordination caching mechanism with memory fragmentation so that query results can be effectively cached inside SQLiteKV by alleviating the discrepancy in data management between front-end SQLite statements and back-end data organization. In the back-end, we adopt an LSM-tree-based key-value database engine, and propose a lightweight metadata management scheme to mitigate the memory requirement. We implemented and deployed SQLiteKV on a Google Nexus 6P smartphone. The results of experiments with various workloads show that SQLiteKV out-performs SQLite by up to 6 times. |
Rights: | All rights reserved |
Access: | open access |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
991022168754003411.pdf | For All Users | 911.44 kB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/9712