Author: Li, Wan-yin Claire
Title: Chinese collocation extraction and its application in natural language processing
Degree: Ph.D.
Year: 2007
Subject: Hong Kong Polytechnic University -- Dissertations.
Natural language processing (Computer science)
Chinese language -- Data processing.
Collocation (Linguistics)
Computational linguistics.
Department: Department of Computing
Pages: xiii, 172 p. : ill. ; 30 cm.
Language: English
Abstract: The tranditional approaches in collocation extraction mainly use statictical models based on co-occurrence association measures, which lead to poor performance both in terms of recall and precision. Collocation extraction in this study explore methods to use collocations features in terms of statistical significance as well as syntactic and semantic information. The first part of this study investigates how to adapt a well known statistical-based system, Xtract for English, for Chinese collocation extraction. In addition to parameter tuning for Chinese, an enhanced algorithm basd on mutual information is developed to extract collocations with relatively low frequencies to improve recall performance. The second part of this study investigates methods to take into consideration of syntactic information to eliminate pseudo collocations and identify low frequency collocations which suit certain syntactic patterns. The syntactic information is based on Part-of-Speech tagging patterns which are obtained from a chunked Chinese corpus. However, the collocation extraction algorithm does not require the testing data to be chunked. The third part of this study investigates methods to take into consideration of semantic information to further improve recall of collocation extraction by using synonym information. The last part of this research explores how to make use of collocation information in word sense disambiguation (WSD). Results show that collocation information can improve the performance of WSD ranging from 3% to 10% using different data sets.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
b21459435.pdfFor All Users24.19 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/2807