Author: | Sit, Wai-man Raymond |
Title: | Automatic recognition of hand-printed Chinese address |
Degree: | M.Sc. |
Year: | 1998 |
Subject: | Optical character recognition devices Chinese character sets (Data processing) Chinese characters -- Data processing Pattern recognition systems Optical data processing Hong Kong Polytechnic University -- Dissertations |
Department: | Multi-disciplinary Studies Department of Computing |
Pages: | viii, 52, [12] leaves : ill. ; 30 cm |
Language: | English |
Abstract: | Off-line recognition of handwritten Chinese characters has long been considered as a very difficult problem. However, following the rapid development in data communications and popularity in Chinese Office Automation (OA) applications, its needs are greatly stimulated. Many researches have been carried out to solve this problem. In this report, a statistical Chinese character recognition method is proposed to recognize the characters commonly used in Hong Kong address and various factors affecting the recognition rate are reported. In order to improve the recognition rate, both global and local features are included in the feature vector. The number of consecutive black pixels are counted as the feature value in four different directions, namely, horizontal, vertical, both diagonals. Depending on where the feature is extracted, the feature can be local or global one. If it is taken from the whole image, it is the global value; if it takes from a small portion in the image, it is a local one. Furthermore, as local features consume much memory, different resolutions of input image can be exploited. The local feature values are only drawn from the image of lower resolutions. This can help to capture both global and local features but consumes less memory; so as to improve the overall matching result. Before the feature is extracted, the input Chinese image should be pre-processed first. It includes binarizing and normalizing the image to a pre-defined size. Different resolutions can then be taken from this image. Afterwards, these different resolution views are thinned to unitary size for getting its skeleton. In addition, k-nearest neighbour rule is adopted as the classifier to determine which class or category the input image belongs to. This approach is selected because it is simple and the knowledge about probability density function is not required. It is suitable for Chinese character recognition problem, as the distributions are very difficult to predict. However, this approach has a disadvantage. It has to store all samples and compare each with the input sample. To avoid this drawback, the input feature vector is first compared with the representatives of each individual word in the character database. The mean in the population of that word is chosen as the representative. For the nearest ten ones, the input image is further compared with their samples in the database. To test the system and build up a character database, 20 persons were invited to write each of 100 Chinese characters which are commonly used in Hong Kong address on a square plane. Finally, over 80% input Chinese characters can be matched correctly within the top 3 candidates. Furthermore, various experiments were carried out to study some possible factors that may affect the recognition rate. These include the selection of global and local features, the choice of k in k-nearest neighbour, and the pre-classification threshold, etc. |
Rights: | All rights reserved |
Access: | restricted access |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
b14369333.pdf | For All Users (off-campus access for PolyU Staff & Students only) | 2.3 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/1757