A hybrid approach for Chinese coreference resolution

Pao Yue-kong Library Electronic Theses Database

A hybrid approach for Chinese coreference resolution

 

Author: Wang, Chi-shing
Title: A hybrid approach for Chinese coreference resolution
Degree: M.Phil.
Year: 2007
Subject: Hong Kong Polytechnic University -- Dissertations.
Natural language processing (Computer science)
Chinese language -- Data processing.
Department: Dept. of Computing
Pages: 111 leaves ; 30 cm.
Language: English
InnoPac Record: http://library.polyu.edu.hk/record=b2145938
URI: http://theses.lib.polyu.edu.hk/handle/200/3131
Abstract: Coreference resolution is the process of determining the entity that noun phrases refer to. A great deal of research has been done on this task in English, using approaches ranging from linguistics-based ones to machine learning-based. In English, these approaches achieve a respectable performance of about 80% when using state-of-the-art algorithms. In Chinese, however, where there has been much less work done, the performance is only 70%. In my thesis, I will address this performance gap and investigate automatic methods for Chinese coreference resolution that make efficient use of resources. I will propose a hybrid approach to this task that can accurately and automatically identify and resolve coreference for noun phrases in unannotated text. Coreference resolution is mainly composed of two tasks, detection and resolution. The goal of detection is to find all possibly coreferring noun phrases using a linguistics-based approach that contains a set of heuristic rules combining information from part-of-speech tagging and full parsing. Resolution groups noun phrases that refer to the same entity by using a machine learning approach that mixes modified k-means clustering and transformation-based learning. The main algorithm is deliberately chosen to maximize available resources; even the features are generated from Internet sources that are free and easily obtainable. With careful selection of suitable features, I will demonstrate in my thesis the trade-off between the efficiency of using fewer features and the performance to be obtained from using more. I will show my results on two Chinese data sets - TDT3 and ACE05. The ACE value coreference resolution results achieved through my approach are 52.5% and 56.6% respectively. An oracle experiment using gold standard noun phrases achieves even more impressive results of 77.0% and 76.4%. I will analyze the results and show that in order for Chinese noun phrase coreference resolution to achieve results competitive with that of English, accurate segmentation, noun phrases and feature identification are currently the parts that most need attention.

Files in this item

Files Size Format
b21459381.pdf 9.585Mb PDF
Copyright Undertaking
As a bona fide Library user, I declare that:
  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

     

Quick Search

Browse

More Information