Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Chinese and Bilingual Studiesen_US
dc.contributor.advisorYao, Yao (CBS)en_US
dc.contributor.advisorLee, Yat Mei Sophia (CBS)en_US
dc.creatorChen, Xinyi-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/13054-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic Universityen_US
dc.rightsAll rights reserveden_US
dc.titleAn investigation of human listeners’ processing of disfluent robot speechen_US
dcterms.abstractOne major area of interest in speech communication research is how people adjust their speaking and listening patterns based on their conversation partner. This adaptation manifests in changes in speech production —such as increased loudness and clarity towards non-native speakers, and more animated, affectionate expressions towards infants and pets. Moreover, listener perception adjusts based on presuppositions about the speaker’s demographics, affecting phoneme discrimination. This thesis investigates human-robot interaction, specifically how disfluencies produced by robots are perceived and processed by humans, in contrast to disfluencies in human-human interactions.en_US
dcterms.abstractIn an era where interactions with machine-generated speech become commonplace, understanding human perceptions of such speech versus natural speech is crucial, yet underexplored. As speech technology advances, machine speech increasingly mimics the naturalness of human conversation, raising questions about our perceptual distinctions between the two. This dissertation probes whether the inclusion of disfluencies (like “um” and “uh”), common in human speech but rare in machine speech, influences our perception of machine speech’s naturalness. Utilizing the Furhat talking robot system to simulate machine speech, the study specifically examines our reactions to these speech patterns, questioning if they make machine speech seem more human-like or if we still perceive a clear divide between machine and natural speech.en_US
dcterms.abstractSpecifically, the focus is on the perception of machine speech containing disfluencies (filled pauses such as “um” and “uh”), prevalent in natural speech but not as common in machine speech yet. A talking robot system, Furhat, is used to generate or embody machine speech.en_US
dcterms.abstractThis dissertation reports on two studies. Study 1 explored whether filled pauses in machine or natural speech could improve listener information retention. It employed a memory test where participants listened to short stories and were later assessed on their recall of plot details. Participants were divided into two groups: a “baseline” group that received auditory stimuli via computer in a self-paced setting, and a “robot-interaction” group that interacted with Furhat, a robot that provided instructions and narrated stories. The key focus was on the effect of disfluencies (filled pauses) in the stories. The study incorporated two memory assessment methods (multiple-choice questions and story retelling), two types of voices (pre-recorded human and text-to-speech synthesized), and, in Experiments 1c and 1d, an additional type of disfluency (silent pauses). Overall, Study 1 found no significant impact of disfluency presence on memory retention across both the baseline and robot-interaction groups.en_US
dcterms.abstractStudy 2 investigates the pragmatic interpretations of filled pauses in conversational contexts, whether in machine or natural speech. The methodology involved presenting participants with dialogues where the final statement might suggest the speaker’s attempt to avoid conveying an unwelcome fact or opinion, with this statement potentially preceded by a filled pause. Participants, after listening to or watching these dialogues, selected statements aligning with their interpretations. The hypothesis posited that filled pauses before the final turn increase the likelihood of perceiving it as an attempt to dodge unwelcome messages. This was examined through two experimental setups: Experiment 2a, an audio-only condition serving as the baseline, and Experiment 2b, an audiovisual condition featuring Furhat the robot delivering the final conversational turn. Results across both conditions confirmed the hypothesis, showing a consistent interpretation of disfluency as indicative of avoidance, regardless of the medium.en_US
dcterms.abstractThe findings from the two studies presented in this dissertation demonstrate that disfluencies in machine speech impact perception similarly to disfluencies in natural speech. These results have significant implications for human-robot interaction models, such as the CASA (Computers as Social Actors) paradigm, highlighting how elements like humanlikeness and voice naturalness influence interaction patterns.en_US
dcterms.extent244 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2024en_US
dcterms.educationalLevelPh.D.en_US
dcterms.educationalLevelAll Doctorateen_US
dcterms.LCSHSpeech processing systemsen_US
dcterms.LCSHHuman-robot interactionen_US
dcterms.LCSHHuman-computer interactionen_US
dcterms.LCSHOral communicationen_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
7506.pdfFor All Users7.6 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13054