Author: Weng, Yi
Title: Audiovisual speech perception in tonal language speakers : evidence from the McGurk paradigm
Advisors: Peng, Gang (CBS)
Degree: Ph.D.
Year: 2025
Subject: Speech perception
Visual perception
Face perception
Cantonese dialects -- Tone
Hong Kong Polytechnic University -- Dissertations
Department: Department of Chinese and Bilingual Studies
Pages: xvi, 171 pages : color illustrations
Language: English
Abstract: Daily face-to-face communication, in its most natural form, involves information from multiple channels, especially audition and vision. In addition to audition, the primary modality for speech perception, the compensatory role of vision in speech perception among typical perceivers has become increasingly evident. However, studies on Indo-European language speakers have demonstrated that the relative weighting of auditory and visual information in speech perception is influenced by a variety of factors, leading to considerable diversity in audiovisual processing strategies across populations. Among these factors, language background, developmental stage, and neurotypicality have been highlighted as key moderators of this diversity in previous research. Furthermore, auditory noise and talking face processing patterns have been proposed to exert broader influences on audiovisual speech processing. Nonetheless, these factors have rarely been examined among speakers of tonal languages, despite tonal languages constituting the majority of the world's languages. To address this gap, the present study employs the classic McGurk paradigm to investigate audiovisual speech perception in tonal language speakers with both behavioural and eye-tracking data, placing a particular focus on various groups of native Cantonese speakers.
To clarify the role of tonal language background, the performances in the classic McGurk paradigm of Cantonese- and Mandarin-speaking young adults were compared under both quiet and noisy conditions. In terms of behavioural responses, Cantonese- and Mandarin-speaking participants exhibited comparable accuracy in identifying congruent stimuli and exhibited a high degree of audiovisual integration when perceiving incongruent stimuli in the quiet condition. However, native speakers of Cantonese, a language characterized by greater complexity in both segmental and suprasegmental aspects, made significantly more audiovisual-integrated responses along with fewer visual-dominant ones compared to their Mandarin-speaking counterparts under both noisy conditions. As revealed by eye-tracking data, greater phonological complexity might also lead Cantonese speakers to rely more on fine-grained visual linguistic cues offered by the mouth area of speakers. Taken together, the impact of language background could be seen by the preference for audiovisual-integrated strategy in noisy conditions among Cantonese-speaking participants, which was potentially attributable to the inherent complexity of Cantonese phonology and the specific patterns of visual attention allocation.
Drawing upon the clarification of the role of language background, we further portrayed the developmental trajectory of audiovisual speech perception within a Cantonese-speaking context using the McGurk paradigm with various levels of auditory noise. A cross-sectional study covering Cantonese children aged 4 to 11 years was conducted. For behavioural responses, Cantonese-speaking children aged 4 to 9 years could not achieve comparable accuracy as adults in identifying congruent stimuli. Meanwhile, consistent with previous findings, the current study supported that tonal language background would not eliminate the developmental shift regarding sensory dominance in audiovisual speech perception as children aged 4 to 9 years made significantly more audio-dominant responses while fewer audiovisual-integrated responses to the incongruent stimuli compared to adults in the quiet condition. No significant differences were detected between the 10–11-year-old group and adults. Eye-tracking data revealed a synchronized developmental course in visual allocation to talking faces, as younger children aged 4 to 9 years fixated less on the mouth area of the speaker during the first half of the stimulus window than adults. In contrast, the 10–11-year-old group showed no significant differences from adults. The findings indicate that the tonal nature of languages does not exempt young speakers from undergoing the developmental shift in audiovisual speech perception, while the timing of the shift may be modulated by language background. Moreover, the link between speech perception and talking face processing is supported by the current findings, which highlight the synchronous developmental courses shared by these two interacting processes.
In light of a more comprehensive understanding of the developmental course in typically-developing (TD) children, learning about the profile of audiovisual speech perception of Cantonese-speaking children with ASD compared with their TD counterparts was also pursued. Autistic individuals aged 8 to 11 years showed atypical patterns in processing audiovisual speech stimuli. When identifying congruent stimuli, autistic individuals achieved poorer accuracy compared with their TD counterparts matched on chronological age (CA-matched TD) or language ability (LA-matched TD). When perceiving incongruent stimuli, the autistic group made significantly more audio-dominant responses relative to their CA-matched TD counterparts. When compared with a group of TD children matched on language ability, they still exhibited differences in terms of within-group comparisons. The preference for the audio-dominant strategy in autistic individuals might result from their avoidance of social stimuli, as suggested by eye-tracking results, which might lead them to reduce visual intake at source in face-to-face contexts. However, differences in mouth-looking time between the autistic group and their TD counterparts were limited. This finding suggests that the core barrier for autistic children in audiovisual speech perception lies in their lack of general social interest rather than an inability to utilize informative visual linguistic cues from mouth movements.
Findings from the current thesis enhance our understanding of the mechanisms underlying audiovisual speech perception among tonal language speakers, with a particular emphasis on the understudied Cantonese-speaking population. Furthermore, the integration of behavioural and eye-tracking data provides new insights into the interconnected roles of speech decoding and talking face processing.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
8190.pdfFor All Users6.41 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13754