Author: Santus, Enrico
Title: Making sense : from word distribution to meaning
Advisors: Huang, Chu-ren (CBS)
Degree: Ph.D.
Year: 2016
Subject: Computational linguistics.
Semantics -- Data processing.
Hong Kong Polytechnic University -- Dissertations
Department: Department of Chinese and Bilingual Studies
Pages: 169 pages : illustrations
Language: English
Abstract: In order to perform complex tasks, Natural Language Processing (NLP) applications need to rely on knowledge resources, whose main building blocks have been identified in entities and relations (Herger, 2014). Given their affinity to semantic memory in human beings, these resources have often been referred to as models of semantic memory (Jones, Willits, & Dennis, 2015). In the last fifty years, a number of these models have been proposed in the cognitive, linguistic and computational literature (Jones, Willits, & Dennis, 2015). While the first generation models were mostly theoretical and were not designed to be computationally implemented (i.e. classic models), starting from the 1980s, a second generation tried to address the learnability issue by adopting representations of meaning that could be learnt automatically by observing word co-occurrence in natural text (i.e. learning models). Among the second generation models, starting from the 1990s, Distributional Semantic Models (DSMs) gained a lot of attention in the cognitive, linguistic and computational communities because they allow the efficient treatment of word meaning and word similarity (Harris, 1954), showing furthermore consistent behaviors with psycholinguistic findings (Landauer & Dumais (1997); Lenci, (2008)). Even though these models are strong in identifying similarity (and therefore relatedness), they were found to suffer from a major limitation, that is they do not offer any principled way to discriminate semantic relations held by words. In fact, since they define word similarity in distributional terms (i.e. Distributional Hypothesis; Harris (1954)), they put together, under the umbrella of similar words, terms that are related by very different semantic relations, such as synonymy, antonymy, hypernymy and co-hyponymy (Santus, Lenci, Lu, & Huang, 2015a). In this thesis we address this limitation proposing several unsupervised methods for the discrimination of semantic relations in DSMs. These methods (i.e. APSyn, APAnt and SLQS) are linguistically and cognitively motivated (Murphy G. L., 2002; Cruse, 1986) and aim at identifying distributional properties that characterize the studied semantic relations (i.e. respectively, similarity, opposition and hypernymy), so that the DSMs are provided with useful discriminative information.
In particular, our measures analyze the properties of the most salient contexts of the target words, under the assumption that these contexts are more informative than the full distribution, which is instead assumed to include noise (Santus, Lenci, Lu, & Huang, 2015a). In order to identify the most salient contexts, for every target we sort them by either the Positive Pointwise Mutual Information (PPMI; Church & Hanks (1989)) or the Positive Local Mutual Information (PLMI; Evert (2005)), and we select the top N ones, which are then used for the extraction of a given distributional property (i.e. intersection, informativeness, etc.). In all our methods, N is a hyperparameter that can be tuned in a range between 50 and 1000. Our measures are carefully described and evaluated, and they are shown to be competitive with the state-of-the-art, sometimes even outperforming the best models in particular settings (including the recently introduced predictive models, generally referred to as word embeddings; see Mikolov, Yih, & Geoffrey (2013)). Their scores, moreover, have been used as features for ROOT9 (Santus, Lenci, Chiu, Lu, & Huang, 2016e), a supervised system that exploits a Random Forest algorithm to classify taxonomical relations (i.e. hypernymy and co-hyponymy versus unrelated words), achieving state-of-the-art performances (Weeds, Clarke, Reffin, Weir, & Keller, 2014). The thesis is organized as follows. The Introduction describes the problem and the reasons behind the adoption of the distributional framework. The first two chapters describe the main models of semantic memory and discuss how computers can learn and manipulate meaning, starting from word distribution in language corpora. Three chapters are then dedicated to the main semantic relations we have dealt with (i.e. similarity, opposition and hypernymy) and the relative unsupervised measures for their discrimination (i.e. APSyn, APAnt and SLQS). The final chapter describes the supervised method ROOT9 for the identification of taxonomical relations. In the Conclusions, we summarize our contribution and we suggest that future work should target i) the systematic study of the hyperparameters (e.g. the impact of N); ii) the merging of the methods for developing a multi-class classification algorithm; and iii) the adaptation of the methods (and/or their principles) to reduced matrices (see Turney & Pantel (2010)) and word embeddings (see Mikolov, Yih, & Geoffrey (2013))
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
b29311792.pdfFor All Users1.66 MBAdobe PDFView/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: