Author: Yang, Zhongqing
Title: Linear discriminant analysis with high dimensional mixed variables
Advisors: Jiang, Binyan (AMA)
Zhao, Xingqiu (AMA)
Degree: Ph.D.
Year: 2022
Subject: Variables (Mathematics)
Dimensional analysis
Mathematical models
Hong Kong Polytechnic University -- Dissertations
Department: Department of Applied Mathematics
Pages: xviii, 78 pages : color illustrations
Language: English
Abstract: With the rapid development of modern measurement technologies, datasets containing both discrete and continuous variables are more and more commonly seen in different areas. In particular, the dimensions of the discrete and continuous variables can oftentimes be very high. Discriminant analysis for mixed variables under the traditional fixed dimension setting has been well studied. Despite the recent progress made in modelling high-dimensional data for continuous variables, there is a scarcity of methods that can deal with a mixed set of variables. To fill this gap, this thesis develops a novel approach for classifying high-dimensional observations with mixed variables. So in this thesis, we aim to develop a simple yet useful classification rule that addresses both the high dimensionality and the mixing structure of the variables simultaneously.
In Chapter 2-3 we introduce our framework building on a location model, in which the distributions of the continuous variables conditional on categorical ones are assumed Gaussian. We overcome the challenge of having to split data into exponentially many cells, or combinations of the categorical variables, by kernel smoothing. And provide new perspectives for its bandwidth choice to ensure an analogue of Bochner's Lemma, which is different to the usual bias-variance tradeoff. We show that the two sets of parameters in our model can be separately estimated and provide a penalized likelihood method for their estimation.
In Chapter 4, some theoretical results are shown. Efficient direct estimation schemes are developed to obtain consistent estimators of the discriminant components.
In Chapter 5, we conduct simulation studies to investigate the performance of proposed semiparametric location model. Results on the estimation accuracy and the misclassification rates are established, and the competitive performance of the proposed classifier is illustrated by extensive simulation and real data studies.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
6293.pdfFor All Users566.31 kBAdobe PDFView/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: