Author: Liu, Hongchao
Title: A corpus based computational model of the lexical aspect and viewpoint aspect in Chinese
Advisors: Huang, Chu-ren (CBS)
Degree: Ph.D.
Year: 2018
Subject: Hong Kong Polytechnic University -- Dissertations
Chinese language -- Dialects -- Mandarin
Chinese language -- Grammar
Chinese language -- Verb
Department: Department of Chinese and Bilingual Studies
Pages: xii, 199 pages : illustrations
Language: English
Abstract: This thesis talks about the lexical aspect and viewpoint aspect in Mandarin Chinese through statistical and computational methodologies. I firstly show the necessity of studying verbs' situation type and applying statistical methodologies toward linguistics studies. Some of the previous aspectual studies on Mandarin Chinese deny the possibility and appropriateness to classify verbs into different situation types. However, they are self-contradictory by using at least three strategies. The first one is to assign the whole structure's situation type to the constituting verb. The second one is to use different terminologies to refer to the situation type in lexical level such as aspectual parameter which is the same with situation type in essence. The third one is to explicitly deny the possibility to classify verbs into different situation types but implicitly do the classification in lexical level. Some other studies are on the right track to admit that situation types are supposed to be differentiated in at least two levels including lexical and sentential level. However, none of them applied statistical validation toward the interaction between situation types and viewpoint aspects. Because of the problems in lexical aspect and methodologies, verb situation type and statistical validation fall into the focus of this thesis. Based on our own intuition and previous studies, I construct a hypothesis stating that aspectual markers including ZHE, LE1, LE2, GUO, ZAI and ZHENGZAI are able to classify different situation types.
I also insist that situation type in lexical level is attached to the different senses of a verb instead of the verb per se and that situation type system is a prototype category. With a hypothesis stating that situation type system is a prototype category, the members of the category are supposed to be clustered based on their family resemblance represented by their ability to co-occur with different aspectual markers. Whether a verb or verb sense is able to co-occur with an aspectual marker is firstly judged by our own intuition and then cross-validated by other annotated resources. A matrix of co-occurrence is constructed including the verbal senses as the rows and the aspectual markers as the columns. The family resemblance is simulated by the distance of the rows position in the vector space represented by the matrix. Hierarchical clustering is implemented and automatically generates the situation type system based on the distance between members. In this way, three situation types are constructed and annotated to all of the selected verbs' senses. Since the situation type system is actually based on human intuition, a corpus- based validation is necessary. All the verb senses are manually linked to Sinica corpus' verbs and a co-occurrence frequency matrix is constructed based on the corpus data. Statistical methodologies such as multinomial logistic regression analysis, are used to validate our situation type system. Aspectual markers' relationship with situation type's cognitive conceptual features including [Telic], [Durative], [Dynamic] etc. are also constructed in this way. Finally, we construct a dataset with verb senses and their situation types and make evaluation tests on it. By using word embedding vectors and supporting vector machine classifier, a best accuracy of 72.05% is achieved.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
991022164556503411.pdfFor All Users2.03 MBAdobe PDFView/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: