Chinese collocation extraction and its application in natural language processing

Pao Yue-kong Library Electronic Theses Database

Chinese collocation extraction and its application in natural language processing

 

Author: Li, Wan-yin Claire
Title: Chinese collocation extraction and its application in natural language processing
Degree: Ph.D.
Year: 2007
Subject: Hong Kong Polytechnic University -- Dissertations.
Natural language processing (Computer science)
Chinese language -- Data processing.
Collocation (Linguistics)
Computational linguistics.
Department: Dept. of Computing
Pages: xiii, 172 p. : ill. ; 30 cm.
Language: English
InnoPac Record: http://library.polyu.edu.hk/record=b2145943
URI: http://theses.lib.polyu.edu.hk/handle/200/2807
Abstract: The tranditional approaches in collocation extraction mainly use statictical models based on co-occurrence association measures, which lead to poor performance both in terms of recall and precision. Collocation extraction in this study explore methods to use collocations features in terms of statistical significance as well as syntactic and semantic information. The first part of this study investigates how to adapt a well known statistical-based system, Xtract for English, for Chinese collocation extraction. In addition to parameter tuning for Chinese, an enhanced algorithm basd on mutual information is developed to extract collocations with relatively low frequencies to improve recall performance. The second part of this study investigates methods to take into consideration of syntactic information to eliminate pseudo collocations and identify low frequency collocations which suit certain syntactic patterns. The syntactic information is based on Part-of-Speech tagging patterns which are obtained from a chunked Chinese corpus. However, the collocation extraction algorithm does not require the testing data to be chunked. The third part of this study investigates methods to take into consideration of semantic information to further improve recall of collocation extraction by using synonym information. The last part of this research explores how to make use of collocation information in word sense disambiguation (WSD). Results show that collocation information can improve the performance of WSD ranging from 3% to 10% using different data sets.

Files in this item

Files Size Format
b21459435.pdf 24.76Mb PDF
Copyright Undertaking
As a bona fide Library user, I declare that:
  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

     

Quick Search

Browse

More Information