|Association tests with incomplete covariates and high-dimensional auxiliary variables
|Wong, Kin Yau (AMA)
Missing observations (Statistics)
Hong Kong Polytechnic University -- Dissertations
|Department of Applied Mathematics
|ix, 139 pages : color illustrations
|In many clinical and epidemiological studies, investigators are interested in testing the presence of association between an outcome variable and covariates of interest. Such analyses are often complicated by missing data. When variables of interest are missing for some subjects, it is desirable to use observed auxiliary variables, which are sometimes high-dimensional, to impute or predict the missing values to improve statistical efficiency. Although many methods have been developed for prediction using high-dimensional variables, it is challenging to perform valid inference based on the predicted values. In this dissertation, we propose novel association testing methods involving missing data with the goal of detecting relevant predictors for outcomes of interest.
We first focus on parametric models and develop an association test for an outcome variable and a partially missing covariate, where the missing values can be predicted using a set of high-dimensional auxiliary variables. The proposed analysis consists of a model selection step and a testing step. Specifically, in the first step, we select a subset of auxiliary variables and fit a regression model of the covariate of interest against the selected features. In the second step, we perform the score test for the covariate in the outcome model under the full likelihood, which includes both the outcome model and the missing covariate model. We then extend the proposed method to a class of semiparametric transformation models for potentially right-censored survival outcomes. We propose a supremum test, where we consider multiple choices of transformation functions, perform individual score test under each outcome model, and take the supremum of the individual test statistics as the proposed test statistic. We show that the proposed testing procedure improves the test performance when the outcome model is unknown.
The validity and advantages of the proposed methods are demonstrated both theoretically and numerically. We establish the asymptotic properties of the proposed test statistics under regularity conditions and show the validity of the tests under data-driven model selection procedures. We evaluate the proposed methods through extensive simulation studies, and show their superior performances over some existing methods. Real data analyses are carried out on major cancer genomic studies.
|All rights reserved
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item: