Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.creatorWang, Chi-shing-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/3131-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic University-
dc.rightsAll rights reserveden_US
dc.titleA hybrid approach for Chinese coreference resolutionen_US
dcterms.abstractCoreference resolution is the process of determining the entity that noun phrases refer to. A great deal of research has been done on this task in English, using approaches ranging from linguistics-based ones to machine learning-based. In English, these approaches achieve a respectable performance of about 80% when using state-of-the-art algorithms. In Chinese, however, where there has been much less work done, the performance is only 70%. In my thesis, I will address this performance gap and investigate automatic methods for Chinese coreference resolution that make efficient use of resources. I will propose a hybrid approach to this task that can accurately and automatically identify and resolve coreference for noun phrases in unannotated text. Coreference resolution is mainly composed of two tasks, detection and resolution. The goal of detection is to find all possibly coreferring noun phrases using a linguistics-based approach that contains a set of heuristic rules combining information from part-of-speech tagging and full parsing. Resolution groups noun phrases that refer to the same entity by using a machine learning approach that mixes modified k-means clustering and transformation-based learning. The main algorithm is deliberately chosen to maximize available resources; even the features are generated from Internet sources that are free and easily obtainable. With careful selection of suitable features, I will demonstrate in my thesis the trade-off between the efficiency of using fewer features and the performance to be obtained from using more. I will show my results on two Chinese data sets - TDT3 and ACE05. The ACE value coreference resolution results achieved through my approach are 52.5% and 56.6% respectively. An oracle experiment using gold standard noun phrases achieves even more impressive results of 77.0% and 76.4%. I will analyze the results and show that in order for Chinese noun phrase coreference resolution to achieve results competitive with that of English, accurate segmentation, noun phrases and feature identification are currently the parts that most need attention.en_US
dcterms.extent111 leaves ; 30 cm.en_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2007en_US
dcterms.educationalLevelAll Masteren_US
dcterms.educationalLevelM.Phil.en_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertations.en_US
dcterms.LCSHNatural language processing (Computer science)en_US
dcterms.LCSHChinese language -- Data processing.en_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
b21459381.pdfFor All Users9.36 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/3131