Author: Tang, Wai Man
Title: Prediction system in big data analytics
Advisors: Yiu, K. F. C. (AMA)
Degree: Ph.D.
Year: 2024
Subject: Big data
Data mining
Machine learning
Hong Kong Polytechnic University -- Dissertations
Department: Department of Applied Mathematics
Pages: xiii, 152 pages : color illustrations
Language: English
Abstract: Forecasting and causality are essential to decision making and resource management by relating exogenous factors or events. In addition, investment return prediction is crucial to have proper risk control and management. Nowadays, applications using advanced technologies are involved in our daily life. Big data can be collected easier in lower cost. Knowledge can be extracted to indicate important changes in the time series of data, where exogenous factors or events should fit for the purpose, as they can be instantaneous or aggregated in certain duration. Prediction and causality are some key functions in data analysis, where models can be used to extract useful features and predict data trends. Feature selection and extraction are crucial methodologies in data analysis, where sequential data is transformed into suitable features for further analysis. Relevant factors or features should be selected, which embed essential information to explain the dependent variable. This is critical to ensure useful models and accurate results.
In this thesis, our works focus on two key types of methods, they are conjoining spatio-temporal data for analysis by neural networks with deep learning, and novel factor subset selection in time-frequency representation. Applications in various aspects are studied. Chapter 2 investigates traffic speed data for multi-timestep forecasting. Congestion speed-cycle patterns of the target road segment are correlated to those of the nearby road segments. Appropriate input subset can be selected for neural network training with deep learning when input data dimensions are minimal. Chapter 3 investigates short-time Fourier Transform (STFT), where consistent patterns are used to identify factor subsets. Multi-factor model with factors in different timeframes should be more useful and practical to forecast future movements in the dynamic environment. Finally, Chapter 4 investigates wavelet transforms, and significant wavelet coefficients can be chosen as peaks by using continuous wavelet transform (CWT). Causality can be established by multiple factor models. Factor subsets are selected by factors with sample lags, which are represented by selecting appropriate wavelet coefficients in terms of both time and frequency.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
7575.pdfFor All Users2.31 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13123