Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Language Science and Technologyen_US
dc.contributor.advisorChen, Si (LST)en_US
dc.contributor.advisorYao, Yao (LST)en_US
dc.creatorChen, Xi-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/14168-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic Universityen_US
dc.rightsAll rights reserveden_US
dc.titleCo-production of speech and facial gestures : a study of focus sentences in mandarin under natural and manipulated conditionsen_US
dcterms.abstractSpeech prosody and facial gestures are tightly integrated in human communication, yet their coordination under altered feedback remains poorly understood. This dissertation investigates the spatial and temporal coordination between vocal prosody and facial gestures in Mandarin Chinese, a tonal language where pitch is lexically significant. Four controlled experiments were conducted, culminating in a comprehensive analysis in Chapter 6 that examines all six feedback perturbation conditions (NA, NV, GA, DA, GV, DV). These conditions systematically manipulated auditory and visual feedback — including normal versus perturbed auditory feedback and normal versus perturbed visual feedback — to challenge the multimodal speech production system.en_US
dcterms.abstractMethodologically, acoustic measures of prosody (fundamental frequency, intensity, and duration) were recorded alongside high-resolution facial gesture data (head movements, eyebrow raises, and jaw displacements) from native Mandarin speakers. The analysis integrates acoustic and facial prosody using linear mixed-effects modeling and is framed by dynamic systems theory (DST) to assess how the two modalities function as a coupled system. I tested competing hypotheses about multimodal coordination: a Trade-off hypothesis, which predicts that if one modality's feedback is degraded speakers will compensate by enhancing the other modality, versus a Hand-in-hand (synergy) hypothesis, which posits that vocal and facial prosody are augmented in tandem to convey emphasis.en_US
dcterms.abstractKey findings reveal clear patterns of modality-specific compensation and flexible integration. Perturbations in auditory feedback elicited significant adjustments in both voice and facial gesture: for example, when auditory feedback was delayed or masked, speakers produced more pronounced facial expressions (larger head and eyebrow movements) and often extended syllable duration, pitch, and intensity, consistent with cross-modal compensation. Likewise, under visual feedback perturbations (e.g., obscured or delayed visual cues), speakers enhanced acoustic prosodic features such as F0 range and intensity to ensure critical tonal and emphatic information was conveyed. In some conditions, feedback alterations led to prosodic enhancement (exaggerated pitch and loudness or emphatic facial gestures), while other conditions caused prosodic suppression (reduced variability in F0 or gesture magnitude), indicating that feedback can modulate how energetically prosody is expressed. Importantly, the timing alignment between facial gesture apexes and the corresponding F0 peaks was affected by feedback changes: under normal conditions these events were tightly synchronized, whereas certain perturbations introduced measurable asynchrony, reflecting a reorganization of coordination timing in the multimodal system.en_US
dcterms.abstractOverall, the results support both the Trade-off and Hand-in-hand hypotheses in complementary ways. Even when one channel's feedback was disrupted, speakers maintained communicative efficacy by boosting signals in the other channel (supporting a two-way compensatory Trade-off). At the same time, vocal and facial modalities generally worked Hand-in-hand, rising and falling together to mark prosodic focus when conditions allowed, underscoring their synergy in prosodic communication. Viewed through a DST framework, these findings suggest that speech prosody and facial gesture form an integrated dynamical system that can compensatively re-coordinate itself under perturbation. This research advances theoretical understanding of multimodal speech production, demonstrating how Mandarin speakers dynamically balance and synchronize auditory and visual prosodic features. The dissertation's insights shed light on the resilience and flexibility of prosodic coordination in a tonal language, highlighting the compensative coupling of voice and facial gestures in conveying meaning under both normal and altered feedback conditions.en_US
dcterms.extent351 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2025en_US
dcterms.educationalLevelPh.D.en_US
dcterms.educationalLevelAll Doctorateen_US
dcterms.LCSHMandarin dialects -- Phonologyen_US
dcterms.LCSHProsodic analysis (Linguistics)en_US
dcterms.LCSHSpeechen_US
dcterms.LCSHFacial expressionen_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
8623.pdfFor All Users41.5 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/14168