Chinese text analyser

Pao Yue-kong Library Electronic Theses Database

Chinese text analyser


Author: Mak, Sai-wai David
Title: Chinese text analyser
Degree: M.Sc.
Year: 2001
Subject: Hong Kong Polytechnic University -- Dissertations
Chinese language -- Data processing
Chinese language -- Discourse analysis
Text processing (Computer science)
Department: Multi-disciplinary Studies
Dept. of Computing
Pages: 96 leaves : ill. ; 30 cm
Language: English
InnoPac Record:
Abstract: As there are more and more content available in Internet and other media, there will be a strong demand in having a smart tools for searching accurate, comprehensive and relevant information. This means a searching tool simulate human perspective will be essential in finding information quickly and accurately. Currently, most searching tools are based on exact "text-matching", however the accuracy and relevance of the retrieved information is not guaranteed. The accuracy can be improved by searching the content through a hierarchy reflecting the relation of Words and clauses. The documents with highest counts of words and clause will be sorted. If multiple documents are searched against the hierarchy, a list of contents of similar meaning or relevancy will be retrieved. There are similar tools built by some researchers. However, these tools are specifically designed for Western language structure. As there are more and more Chinese (traditional or simplified) content available in Internet or other media. It will be convenient to have a tool, which assist Chinese user to search /select relevant information. Since the lexical structure are different for Chinese and English language. This dissertation is intended to investigate a generic method of analysing without employing much of the complicated linguistic rules. It serves Chinese content particularly. It is expected to be able to run under general operating platform. The proposed methodology is making use of identifying and removing the useless words or phrases in a passage and then the important content will be extracted. Some statistical rules are employed to calculate and summarise the relationship between the important words and phrases. A primitive semantic network is built, which can be used for further processing of other Chinese documents.

Files in this item

Files Size Format
b16681496.pdf 2.873Mb PDF
Copyright Undertaking
As a bona fide Library user, I declare that:
  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.


Quick Search


More Information