Author: Gao, Dehong
Title: Cross-lingual sentiment lexicon learning
Degree: Ph.D.
Year: 2014
Subject: Computational linguistics
Semantics
Hong Kong Polytechnic University -- Dissertations
Department: Department of Computing
Pages: xx, 157 pages : illustrations ; 30 cm
Language: English
Abstract: Sentiment lexicon contains a certain number of known-sentiment words (e.g., "good", "nice" and "bad"). It has been widely recognized that sentiment lexicon plays a fundamental role in sentiment analysis. Relative to the existing sentiment lexicons in English, the available sentiment lexicons in the other languages such as Chinese are far from sufficient. This dissertation focuses on Cross-lingual Sentiment Lexicon Learning (CSLL), whose goal is to make full use of the existing sentiment resources from one (or more) language(s) to automatically learn sentiment lexicon(s) for other language(s). The dissertation work makes a systematic study on CSLL. In bilingual graph based sentiment lexicon learning, a bilingual graph is built with the words in English and in a target language for which we want to generate the sentiment lexicon. A label propagation based approach is proposed to transfer the sentiment information from English to the target language. To the best of our knowledge, the word alignment information derived from the parallel corpus is the first time leveraged to build the inter-language relations in CSLL, which is proved to significantly increase the coverage of the learned sentiment lexicon. In this work, the sentiment polarity of a word is determined by the sentiment information of the connected words in the bilingual graph. In Co-training based bilingual sentiment lexicon learning, we consider not only the sentiment information of the connected words, but also the information about the words themselves (e.g., word definitions). From these two types of information, novel and effective features are explored to deduce the sentiment polarity of a word. With these features, CSLL is considered as word level sentiment classification and the two classifiers are developed based on the co-training framework to predict the sentiment polarities of the words in two languages respectively. In particular, the learning processes of the two classifiers are connected by the word associations derived from the bilingual resources (e.g. bilingual dictionaries). In these two pieces of work, the words with similar semantics are assumed to have similar sentiments. The proposed approaches can thus connect or associate the semantic-similar words in the learning processes. However, the words similar in semantics do not always have the similar sentiments, especially when the words have multiple senses. In multilingual sentiment lexicon learning, we are dedicated to automatically refine the semantic-oriented connections to the sentiment-oriented connections. Incorporating with multilingual (sentiment) resources, a novel label propagation based approach is developed to propagate sentiment information between multiple languages and to automatically update the weights of the connections. The main contribution of this work is that the proposed approach not only performs well in multilingual sentiment lexicon learning, but also provides a new strategy for graph update. Extensive experiments have been conducted in each piece of work and experimental results demonstrate the effectiveness of the approaches proposed. To summarize, as one of the few large-scale studies on CSLL, this dissertation provides complete learning techniques and a deep analysis on the key factors for cross-lingual sentiment lexicon learning.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
b27629600.pdfFor All Users2.83 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/7759