Author: Wang, Yuqi
Title: Cross domain data analytics for urban computing
Advisors: Cao, Jiannong (COMP)
Degree: Ph.D.
Year: 2018
Subject: Hong Kong Polytechnic University -- Dissertations
Big data
Data mining
Department: Department of Computing
Pages: xviii, 119 pages : color illustrations
Language: English
Abstract: With the rapid development of information technologies, we are entering the era of big data. Large amount of data in urban spaces are collected from various domains such as transportation, logistics, Point of Interests (POI), etc. The data reflect different aspects of cities in various ways, offering great opportunities for better understanding of the city's operation, and optimization of the infrastructure. Effective data analytics is the key to unlock the power of these big data. Although previous works mostly focus on data from single domain, Cross Domain Data Analytics is attracting increasing attention and lies at the core of many urban problems and applications. Cross domain data analytics offers two additional opportunities than traditional single domain data analytics. First, it provides a more comprehensive picture about the studied problems based on the information from different angles, which helps gain new insights by discovering the correlations among cross-domain datasets. Second, it improves decision making by complementing data sources for joint analysis, especially for the cases where data are insufficient in some domains. Meanwhile, urban computing aims at utilizing urban big data, typically from different domains, to facilitate important urban operations such as traffic management, energy reduction and so on. In this way, urban computing offers a perfect application scenario for cross domain data analytics. Thus, this thesis focuses on Cross Domain Data Analytics for Urban Computing, studies the problem of jointly analyzing data from different domains to generate hidden insights and enable intelligent decision-making, and proposes effective solutions to three important applications in urban computing for demonstration. First, we study the problem of traffic congestion, and show how to jointly utilize data from three domains, namely GPS trajectories, road network and POI data to generate insights. Previous work mainly focuses on the prediction of congestion and analysis of traffic flows, while the congestion correlation between road segments has not been studied yet. In this work, we propose a three-phase framework to explore the congestion correlation between road segments from multiple real world data. In the first phase, we extract congestion information on each road segment from GPS trajectories of over 10,000 taxis, define congestion correlation and propose a corresponding mining algorithm to find out all the existing correlations. In the second phase, we extract various features on each pair of road segments from road network and POI data. In the last phase, the results of the first two phases are input into several classifiers to predict congestion correlation. We further analyze the important features and evaluate the results of the trained classifiers through experiments. We found some important patterns that lead to a high/low congestion correlation, and they can facilitate building various transportation applications. In addition, we found that traffic congestion correlation has obvious directionality and transmissibility.
Second, we study the problem of order response time prediction to enable intelligent decision-making in logistics services by jointly considering both order historical records and driver GPS trajectories from two different domains. Accurate prediction of order response time would not only facilitate decision making on order dispatching, but also pave ways for applications such as supply-demand analysis and driver scheduling, leading to high system efficiency. In this work, we forecast order response time on current day by fusing data from order history and driver historical locations. Specifically, we propose Coupled Sparse Matrix Factorization (CSMF) to deal with the heterogeneous fusion and data sparsity challenges raised in this problem. CSMF jointly learns from multiple heterogeneous sparse data through the proposed weight setting mechanism therein. Experiments on real-world datasets demonstrate the effectiveness of our approach, compared to various baseline methods. The performances of many variants of the proposed method are also presented to show the effectiveness of each component. Third, we extend the previous method to incorporate more context information by proposing a Coupled Weighted Tensor-matrix Factorization (CWTF) for accurate prediction on order accepting probabilities of van drivers, which would facilitate efficient order dispatching and improve user experience. However, it is difficult to handle the inherent heterogeneous data fusion, sparsity and efficiency challenges simultaneously. In this work, we propose a three-stage framework with a Coupled Weighted Tensor-matrix Factorization method for order accepting probability prediction in logistics services. Specifically, orders are first grouped into clusters to enrich the sparse interactions between orders and drivers; then an accepting probability tensor with the three dimensions of driver, order cluster, and time is generated by a tensor-matrix factorization method that fuses order characteristics and driver behaviors in an efficient way; finally given a new order, the accepting probability of each driver is efficiently predicted by directly retrieving from the learned tensor. The experiment results on a large dataset from a famous app-based logistics platform, demonstrate the superiority of the proposed method against various baseline methods.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
991022165759503411.pdfFor All Users1.85 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/9660