Author: Mak, Sai-wai David
Title: Chinese text analyser
Degree: M.Sc.
Year: 2001
Subject: Hong Kong Polytechnic University -- Dissertations
Chinese language -- Data processing
Chinese language -- Discourse analysis
Text processing (Computer science)
Department: Multi-disciplinary Studies
Dept. of Computing
Pages: 96 leaves : ill. ; 30 cm
Language: English
InnoPac Record:
Abstract: As there are more and more content available in Internet and other media, there will be a strong demand in having a smart tools for searching accurate, comprehensive and relevant information. This means a searching tool simulate human perspective will be essential in finding information quickly and accurately. Currently, most searching tools are based on exact "text-matching", however the accuracy and relevance of the retrieved information is not guaranteed. The accuracy can be improved by searching the content through a hierarchy reflecting the relation of Words and clauses. The documents with highest counts of words and clause will be sorted. If multiple documents are searched against the hierarchy, a list of contents of similar meaning or relevancy will be retrieved. There are similar tools built by some researchers. However, these tools are specifically designed for Western language structure. As there are more and more Chinese (traditional or simplified) content available in Internet or other media. It will be convenient to have a tool, which assist Chinese user to search /select relevant information. Since the lexical structure are different for Chinese and English language. This dissertation is intended to investigate a generic method of analysing without employing much of the complicated linguistic rules. It serves Chinese content particularly. It is expected to be able to run under general operating platform. The proposed methodology is making use of identifying and removing the useless words or phrases in a passage and then the important content will be extracted. Some statistical rules are employed to calculate and summarise the relationship between the important words and phrases. A primitive semantic network is built, which can be used for further processing of other Chinese documents.

