Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.contributor.advisorLo, Eric (COMP)-
dc.contributor.advisorYiu, Ken (COMP)-
dc.creatorBai, Ran-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/10187-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic University-
dc.rightsAll rights reserveden_US
dc.titleNew query and analytics over large sequence data : a study on streaks and streamen_US
dcterms.abstractSequence data consists of ordered items or elements. Query processing and analytics on large sequence data have many research challenges. This thesis studies two important problems in this domain. The first part of this thesis studies a new problem of finding "historic moments" from sequence data. Specifically, we introduce a new concept called "historic moments", which is motivated from real applications such as computational journalism. We present algorithms to efficiently compute historic moments from sequence data. The algorithm is incremental and space-optimal, meaning that when facing new data arrival, it is able to efficiently refresh the results by keeping minimal information. Case studies show that historic moments can significantly improve the insights offered by prominent streaks alone. The second part of this thesis studies another new problem of answering range-count query over data stream. Specifically, in applications such as network monitoring, telecommunication analysis, and sensor measurements, massive amounts of data arrive as a high-rate stream and real-time analytic over the stream data is required. Maintaining a succinct synopsis structure called sketch over the data stream has been a dominant approach to support analysis in those applications. Recent applications, however, demand more sophisticated types of queries and range-count query is our focus in this thesis. Unfortunately, state-of-the-art sketches perform poorly when facing range-counting as none of them was designed to support range-count queries at the outset. In this thesis, we aim to fill the gap and present LSH-Sketch, a sketch that supports range-counting over rapid data stream. As a sketch that supports range-count queries, LSH-Sketch can naturally support point-count queries as well. As its name suggests, LSH-Sketch is based on the use of locality sensitive hashing. Like the classic CM-Sketch, LSH-Sketch is also a core sketch that many sketch variants and applications can be built on top. Empirical results show that LSH-Sketch's insertion throughput is as good as CM-Sketch and it outperforms CM-Sketch in terms of accuracy and query throughput under all query ranges. LSH-Sketch thus has the potential to replace CM-Sketch to serve as the core sketch in multiple application domains.en_US
dcterms.extentxv, 123 pages : illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2019en_US
dcterms.educationalLevelPh.D.en_US
dcterms.educationalLevelAll Doctorateen_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.LCSHElectronic data processingen_US
dcterms.LCSHBig dataen_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
991022287147403411.pdfFor All Users2.11 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/10187