Author: | Bai, Ran |
Title: | New query and analytics over large sequence data : a study on streaks and stream |
Advisors: | Lo, Eric (COMP) Yiu, Ken (COMP) |
Degree: | Ph.D. |
Year: | 2019 |
Subject: | Hong Kong Polytechnic University -- Dissertations Electronic data processing Big data |
Department: | Department of Computing |
Pages: | xv, 123 pages : illustrations |
Language: | English |
Abstract: | Sequence data consists of ordered items or elements. Query processing and analytics on large sequence data have many research challenges. This thesis studies two important problems in this domain. The first part of this thesis studies a new problem of finding "historic moments" from sequence data. Specifically, we introduce a new concept called "historic moments", which is motivated from real applications such as computational journalism. We present algorithms to efficiently compute historic moments from sequence data. The algorithm is incremental and space-optimal, meaning that when facing new data arrival, it is able to efficiently refresh the results by keeping minimal information. Case studies show that historic moments can significantly improve the insights offered by prominent streaks alone. The second part of this thesis studies another new problem of answering range-count query over data stream. Specifically, in applications such as network monitoring, telecommunication analysis, and sensor measurements, massive amounts of data arrive as a high-rate stream and real-time analytic over the stream data is required. Maintaining a succinct synopsis structure called sketch over the data stream has been a dominant approach to support analysis in those applications. Recent applications, however, demand more sophisticated types of queries and range-count query is our focus in this thesis. Unfortunately, state-of-the-art sketches perform poorly when facing range-counting as none of them was designed to support range-count queries at the outset. In this thesis, we aim to fill the gap and present LSH-Sketch, a sketch that supports range-counting over rapid data stream. As a sketch that supports range-count queries, LSH-Sketch can naturally support point-count queries as well. As its name suggests, LSH-Sketch is based on the use of locality sensitive hashing. Like the classic CM-Sketch, LSH-Sketch is also a core sketch that many sketch variants and applications can be built on top. Empirical results show that LSH-Sketch's insertion throughput is as good as CM-Sketch and it outperforms CM-Sketch in terms of accuracy and query throughput under all query ranges. LSH-Sketch thus has the potential to replace CM-Sketch to serve as the core sketch in multiple application domains. |
Rights: | All rights reserved |
Access: | open access |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
991022287147403411.pdf | For All Users | 2.11 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/10187