Author: | Chen, Jing |
Title: | Tracing lexical semantic change with distributional semantics : detection, evaluation, and interpretation |
Advisors: | Huang, Chu-ren (CBS) |
Degree: | Ph.D. |
Year: | 2024 |
Subject: | Linguistic change Historical linguistics Computational linguistics Hong Kong Polytechnic University -- Dissertations |
Department: | Department of Chinese and Bilingual Studies |
Pages: | xxii, 158 pages : color illustrations |
Language: | English |
Abstract: | The language we live by is in constant change, and the evolving process is manifested in the ways we use it differently over time. With the growing availability of digitalized historical textual data and the increasing powerful language models, recent studies have demonstrated the potential to leverage these models to(semi)automatically identify words that have undergone semantic shifts, particularly in Indo-European languages. This dissertation aims to explore the efficacy of embedding-based methods in interpreting semantic change within Chinese data. I embark on this journey by validating the current computational fashion with Chinese data on a popular lexical semantic change detection task, namely Graded Change Detection, in experiments constrained to periods before and after the sociopolitical backdrop of the Reform and Opening Up in modern Chinese. A significant contribution of this work is the creation of the first shared benchmark for Chinese semantic change, Chi-WUG, which includes over 61,000 human judgments on 1,600 sentence pairs targeting 40 different words. A systematic evaluation of various models in experiments — including count-based, static, and contextualized models — highlights the performance of the contextualized ones, especially the XL-LEXEME model, which correlates significantly with human judgments (best scores exceeding 0.800). Notably, SGNS-based models demonstrate strong robustness, maintaining consistent performance under varied training conditions. Beyond the scope of the initial experiments, I glimpse the interpretative power of embedding-based methods for uncovering broader linguistic trends. Building on the findings from preferred models, the study expanded the static two-period comparison to a dynamic longitudinal analysis, allowing for consistent examination exemplified by a semantically shifted word. Moreover, by expanding the analysis from a select group of predefined words to the entire lexicon, it became possible to observe more subtle and less-discussed semantic shifts The affirmative validation of embedding-based methods has greatly piqued my interest in exploring more complex cases of semantic change that occurred during the period, particularly those interacting with other linguistic processes. This analysis specifically examines semantic shifts that have occurred in word-formation patterns through the two synonymous constructions ‘X-zu’ and ‘X-tuan’, both denoting a group of people, to analyze how their constructional meanings have evolved alongside their increasing morphological productivity. By obtaining temporal representations for each attestation, this chapter examines the semantic distributions defined by attested types in semantic space across different periods. It reveals how ‘X-zu’ has broadened its semantic scope to encompass not only ethnic groups but also individuals with shared interests. In contrast, X-tuan displays a less expansive semantic shift. This difference is further illuminated by statistically examining the numbers and density of clusters over time, coupled with an analysis of non-linear development trends. This dissertation not only addresses key questions regarding the role of word embeddings in interpreting semantic change within the Chinese language but also makes substantial contributions to the field. It introduces a high-quality benchmark that facilitates experimentation with Chinese data and conducts the first comprehensive and systematic evaluation of various models. These contributions establish a foundational baseline for future research and highlight the potential of computational approaches to address traditional topics in Chinese linguistics. Moreover, this project enriches theoretical discussions on the interplay between morphological productivity and semantic change. The findings from this research underscore the need for more sophisticated statistical models and the integration of social dimensions in future explorations. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/13131