Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.creatorLi, Wan-yin Claire-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/2807-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic University-
dc.rightsAll rights reserveden_US
dc.titleChinese collocation extraction and its application in natural language processingen_US
dcterms.abstractThe tranditional approaches in collocation extraction mainly use statictical models based on co-occurrence association measures, which lead to poor performance both in terms of recall and precision. Collocation extraction in this study explore methods to use collocations features in terms of statistical significance as well as syntactic and semantic information. The first part of this study investigates how to adapt a well known statistical-based system, Xtract for English, for Chinese collocation extraction. In addition to parameter tuning for Chinese, an enhanced algorithm basd on mutual information is developed to extract collocations with relatively low frequencies to improve recall performance. The second part of this study investigates methods to take into consideration of syntactic information to eliminate pseudo collocations and identify low frequency collocations which suit certain syntactic patterns. The syntactic information is based on Part-of-Speech tagging patterns which are obtained from a chunked Chinese corpus. However, the collocation extraction algorithm does not require the testing data to be chunked. The third part of this study investigates methods to take into consideration of semantic information to further improve recall of collocation extraction by using synonym information. The last part of this research explores how to make use of collocation information in word sense disambiguation (WSD). Results show that collocation information can improve the performance of WSD ranging from 3% to 10% using different data sets.en_US
dcterms.extentxiii, 172 p. : ill. ; 30 cm.en_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2007en_US
dcterms.educationalLevelAll Doctorateen_US
dcterms.educationalLevelPh.D.en_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertations.en_US
dcterms.LCSHNatural language processing (Computer science)en_US
dcterms.LCSHChinese language -- Data processing.en_US
dcterms.LCSHCollocation (Linguistics)en_US
dcterms.LCSHComputational linguistics.en_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
b21459435.pdfFor All Users24.19 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/2807