Model for zero-inflated proportion data analysis

Zheng, Yangzi

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Applied Mathematics	en_US
dc.contributor.advisor	Zhao, Xingqiu (AMA)	en_US
dc.contributor.advisor	Jiang, Binyan (AMA)	en_US
dc.creator	Zheng, Yangzi	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/13188	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Model for zero-inflated proportion data analysis	en_US
dcterms.abstract	The examination and interpretation of datasets containing a substantial number of zeros have become increasingly relevant across various disciplines, including ecology and sociological studies. While there has been extensive research on zero-inﬂated count data, models speciﬁcally designed for proportion data with a high occurrence of zeros remain relatively limited. This thesis addresses this gap by focusing on zero-inﬂated proportion data and proposing a novel modeling approach to distinguish between two types of zeros present in the dataset. The primary objective is to develop a regression model that can effectively capture and differentiate these two types of zeros. The ﬁrst type of zero, which corresponds to random absence, is modeled using a binomial sampling approach. This accounts for instances where the proportion value is zero due to random factors or chance. The second type of zero, arising from unsuitability, is handled using a general classiﬁcation indicator. This indicator helps identify situations where the proportion value is zero due to the unsuitability of certain conditions or factors. To achieve our objective, we propose both parametric and semi-parametric models, providing ﬂexibility and robustness in capturing the characteristics of the zero-inﬂated proportion data. By introducing these innovative models, we aim to enhance the understanding and analysis of datasets with a high occurrence of zeros. This research contributes to the development of methodologies speciﬁcally tailored for zero-inﬂated proportion data, addressing a signiﬁcant gap in the existing literature.	en_US
dcterms.abstract	In the ﬁrst section of our study, we focus on investigating a semi-parametric model. This model comprises two components: a regression component that incorporates weighted least squares to account for heterogeneity, and a classiﬁcation component that beneﬁts from an optimal decision rule derived from our model. To estimate the parameters based on the optimal decision rule, we employ the Nadaraya-Watson estimator. This estimator ensures the accuracy of our classiﬁcation and contributes to the overall robustness of the model. The results of our investigation reveal that environmental features play a crucial role in understanding both types of zeros: those related to perfection and those resulting from random absence. By utilizing our proposed modeling approach, researchers can gain deeper insights into the factors that contribute to these different types of zeros, thereby improving their understanding of the underlying processes. Furthermore, our model demonstrates superior performance in both simulated and real-world scenarios when compared to traditional methods such as the Tobit model and the zero-inﬂated beta regression model. By signiﬁcantly reducing prediction errors, our model is proven to be a valuable tool for accurate estimation and prediction in various applications. By presenting these ﬁndings, we highlight the effectiveness and practicality of our semi-parametric model, enabling researchers to make more informed decisions and gain a comprehensive understanding of the factors inﬂuencing both types of zeros and the positive percent rate.	en_US
dcterms.abstract	In the second section, our main objective is to provide a precise interpretation of the factors that inﬂuence the defective rate. Particularly, we focus on the indicator part, which was left undeﬁned in the ﬁrst part but has garnered more attention due to its exploration of the covariates that distinguish the zero part from the non-zero part. In the original model assumption, the presence of the indicator part creates complexity in inferring the parameters. Taking inspiration from the smoothed maximum score estimator, we introduce a parametric model by replacing the indicator part with a smoothed kernel estimator. This substitution yields a continuously differentiable loss function, which greatly facilitates further analysis. Similar to the previous section, we take into account heterogeneity and utilize the weighted least square method to estimate both parameters. Subsequently, we establish the consistency and asymptotically normal properties for both the regression and indicator estimators. These properties assure the reliability and validity of our estimators in capturing the underlying relationships and distinguishing between the zero and non-zero parts effectively.	en_US
dcterms.extent	xviii, 78 pages : illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2024	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.LCSH	Mathematics -- Data processing	en_US
dcterms.LCSH	Regression analysis	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
7640.pdf	For All Users	980.85 kB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13188