Author: Wu, Conglin
Title: Hydrological predictions using data-driven models coupled with data preprocessing techniques
Degree: Ph.D.
Year: 2010
Subject: Hong Kong Polytechnic University -- Dissertations
Hydrology -- Data processing
Hydrological forecasting
Department: Department of Civil and Structural Engineering
Pages: xi, 246 p. : ill. (some col.) ; 30 cm.
Language: English
Abstract: Data-driven models, particularly soft computing models, have become an appropriate alternative to knowledge-driven models in many hydrological prediction scenarios including rainfall, streamflow, and rainfall-runoff. The primary reason is that data-driven models rely solely on previous hydro-meteorological data without directly taking into account the underlying physical progress. However, it is inevitable that data-driven models introduce uncertainty to the forecasting as a result of over-simplified assumption, inappropriate training data, model inputs, model configuration, and even individual experience of modelers. This thesis makes an endeavor to improve the accuracy of hydrological forecasting in three aspects, model inputs, selection of models, and data-preprocessing techniques. Seven input techniques, namely, linear correlation analysis (LCA), false nearest neighbors, correlation integral, stepwise linear regression, average mutual information, partial mutual information, artificial neural network (ANN) based on multi-objective genetic algorithm, are first examined to select optimal model inputs in each prediction scenario. Representative models, such as K-nearest-neighbors (K-NN) model, dynamic system based model (DSBM), ANN, modular ANN (MANN), and hybrid artificial neural network-support vector regression (ANN-SVR), are then proposed to conduct rainfall and streamflow forecasts. Four data-preprocessing methods including moving average (MA), principal component analysis (PCA), singular spectrum analysis (SSA), and wavelet analysis (WA), are further investigated by integration with the abovementioned forecasting models. K-NN, ANN, and MANN are used to predict monthly and daily rainfall series with linear regression (LR) as the benchmark. The comparison of seven input techniques indicates that LCA is able to identify model inputs reasonably. In the normal mode (viz., without data preprocessing), MANN performs the best, but the advantage of MANN over ANN is not significant in monthly rainfall series forecasting. Compared with results in the normal mode, the improvement of the model performance generated by SSA is considerable whereas MA or PCA imposes negligible influence. Coupled with SSA, advantages of MANN over other models are quite noticeable, particularly for daily rainfall forecasting.
ANN, MANN, ANN-SVR, and DSBM are employed to conduct estimates of monthly and daily streamflow series where model inputs only depend on previous flow observations. The best model inputs are also identified by LCA. In the normal mode, the global DSBM model shows close performance to ANN. MANN and ANN-SVR tend to be replaceable by each other and are able to noticeably improve the accuracy of flow predictions, particularly for a non-smooth flow series, when compared to ANN. However, the prediction lag effect can be observed in daily streamflow series forecasting. In data preprocessing mode, both SSA and WA bring significant improvement of model performance, but SSA shows a remarkable superiority over WA. ANN, MANN, and LR are also used to perform daily rainfall-runoff (R-R) prediction where model inputs consist of previous rainfall and streamflow observations. The best model inputs are also attained by LCA. Irrespective of modes, the advantage of MANN over ANN is not obvious. Compared to models depending solely on previous flow data as inputs, these R-R models make more accurate predictions. However, the improvement tends to mitigate with the increase of forecasting horizons in the normal mode. The situation becomes reverse in the SSA mode where the advantage of the ANN R-R model becomes more significant as the prediction horizon increases. The findings above focused on results of point prediction, which uses the ANN-SSA R-R model. On the basis of this model, we complement this with the uncertainty estimation based on local errors and clustering (UNEEC) method so as to attain interval prediction of daily rainfall-runoff. The UNEEC method is then compared to the bootstrap method. Results indicate that the UNEEC performs better in locations of low flows whereas the bootstrap method proves to be well suited in locations of high flows. One of the major contributions of this research is the exploration of a viable modeling technique of coupling data-driven models with SSA. The technique has been tested with hydrological forecasts in rainfall, streamflow, and rainfall-runoff, and predicted results are in good agreement with observations.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
b23930640.pdfFor All Users2.22 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/5912