Author: | Tang, Wai Man |
Title: | Prediction system in big data analytics |
Advisors: | Yiu, K. F. C. (AMA) |
Degree: | Ph.D. |
Year: | 2024 |
Subject: | Big data Data mining Machine learning Hong Kong Polytechnic University -- Dissertations |
Department: | Department of Applied Mathematics |
Pages: | xiii, 152 pages : color illustrations |
Language: | English |
Abstract: | Forecasting and causality are essential to decision making and resource management by relating exogenous factors or events. In addition, investment return prediction is crucial to have proper risk control and management. Nowadays, applications using advanced technologies are involved in our daily life. Big data can be collected easier in lower cost. Knowledge can be extracted to indicate important changes in the time series of data, where exogenous factors or events should fit for the purpose, as they can be instantaneous or aggregated in certain duration. Prediction and causality are some key functions in data analysis, where models can be used to extract useful features and predict data trends. Feature selection and extraction are crucial methodologies in data analysis, where sequential data is transformed into suitable features for further analysis. Relevant factors or features should be selected, which embed essential information to explain the dependent variable. This is critical to ensure useful models and accurate results. In this thesis, our works focus on two key types of methods, they are conjoining spatio-temporal data for analysis by neural networks with deep learning, and novel factor subset selection in time-frequency representation. Applications in various aspects are studied. Chapter 2 investigates traffic speed data for multi-timestep forecasting. Congestion speed-cycle patterns of the target road segment are correlated to those of the nearby road segments. Appropriate input subset can be selected for neural network training with deep learning when input data dimensions are minimal. Chapter 3 investigates short-time Fourier Transform (STFT), where consistent patterns are used to identify factor subsets. Multi-factor model with factors in different timeframes should be more useful and practical to forecast future movements in the dynamic environment. Finally, Chapter 4 investigates wavelet transforms, and significant wavelet coefficients can be chosen as peaks by using continuous wavelet transform (CWT). Causality can be established by multiple factor models. Factor subsets are selected by factors with sample lags, which are represented by selecting appropriate wavelet coefficients in terms of both time and frequency. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/13123