|Author:||Lan, Kwok-cheung Cyrus|
|Title:||A probabilistic approach to natural language disambiguation : semantic role labeling and dialogue act recognition|
|Subject:||Hong Kong Polytechnic University -- Dissertations.|
Natural language processing (Computer science)
Ambiguity -- Data processing.
|Department:||Department of Computing|
|Pages:||xii, 117 p. : iil. ; 30 cm.|
|Abstract:||Resolving ambiguities has been a central problem in natural language processing. Most disambiguation tasks to date have focused on relatively low level processing such as morphological, lexical, and syntactic analysis. Their considerable success has stimulated research in higher level, but harder, disambiguation tasks. This thesis addresses two disambiguation tasks, one is at semantic level and the other is at pragmatic level. The tasks are referred to as, respectively, semantic role labeling and dialogue act recognition. We address both tasks using a probabilistic framework, which is in the form of conditional distribution p(ambiguity\ expression, context). We estimate the distribution by conditional Maximum Entropy, which allows heterogeneous sources of information to be integrated in a unifed model for disambiguation. Based on the principle of Maximum Entropy, the selected distribution is of the highest entropy, where no unjustifed assumption is made on the training data while keeping easy for feature modeling. Maximum Entropy has been empirically proved useful in various applications, with moderately effective training time. In the semantic role labeling task, we propose a three-phase labeling approach to the problem. The approach combines advantages from previously proposed methods, while addressing their weaknesses. The approach decomposes the problem of recognizing a complex structure into several local decisions, each recognizing a single piece of the structure. The decisions are determined by supervised learning techniques, by training algorithms from data for prediction. Evaluations on public benchmarks show that our recognition performance is competitive with the current best individual system. In the dialogue act recognition task, we target at non-task oriented recognition. We study various types of features, including lexical, syntactic, and discourse, to evaluate the recognition performance. A feature selection method is used for systematically optimizing the feature set. Experimental results show that our system outperforms all the other approaches that use the same public data set. Despite the high micro-average performance achieved in both tasks, the macro-average performance is unsatisfactory. This is due to the class-imbalance problem in the data sets, where the distribution of examples among the classes is highly skewed. We employ two methods to address this problem in each task. One is over-sampling and the other is error-based learning. Experimental results showed that both methods are effective in improving the macro-average performance in most cases.|
|Rights:||All rights reserved|
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item: