Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.contributor.advisorZheng, Yuanqing (COMP)en_US
dc.contributor.advisorXiao, Bin (COMP)en_US
dc.creatorYang, Qiang-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/12290-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic Universityen_US
dc.rightsAll rights reserveden_US
dc.titleTowards context-aware voice interaction via acoustic sensingen_US
dcterms.abstractVoice interaction has become the fundamental approach to connecting humans and smart devices. Such an interface enables users to easily complete daily tasks by voice commands, which not only contain the explicit user's semantic meaning but also imply the user's physical context information such as location and speaking direction. Although current speech recognition technology allows devices to accurately understand voice content and take smart actions, these contextual clues can further help smart devices make more intelligent responses. For example, knowing a user's location helps narrow down the possible set of voice commands and provides customized services to users in a kitchen.en_US
dcterms.abstractAcoustic sensing has been studied for a long time. However, unlike actively transmitting handcrafted sensing signals, we can only obtain the voice on the receiver side, making sensing voice contexts challenging. In this thesis, we use voice signals as a sensing modality and propose new acoustic sensing techniques in a passive way to extract the physical context of the voice/user: location, speaking direction, and liveness. Specifically, (1) inspired by the human auditory system, we investigate the effects of human ears on binaural sound localization and design a bionic machine hearing framework to locate multiple sounds with binaural microphones. (2) We exploit the voice energy and frequency radiation patterns to estimate the user's head orientation. By modeling the anisotropic property of voice propagation, we can measure the user's speaking direction, serving as a valuable context for smart voice assistants. (3) Attackers may use a loudspeaker to play pre-recorded voice commands to deceive voice assistants. We check the sound generation difference between humans and loudspeakers and find that the human's rapid-changing mouth leads to a more dynamic sound field. Thus, we can detect voice liveness and defend against such replay attacks by examining sound field dynamics.en_US
dcterms.abstractTo achieve such context-aware voice interactions, we look into the physical properties of voice, work with hardware and software, and introduce new algorithms by drawing from principles in acoustic sensing, signal processing, and machine learning. We implement these systems and evaluate them with various experiments, demonstrating that they can facilitate many new real-world applications, including multiple sound localization, speaking direction estimation, and replay attack defense.en_US
dcterms.extentxv, 139 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2023en_US
dcterms.educationalLevelPh.D.en_US
dcterms.educationalLevelAll Doctorateen_US
dcterms.LCSHSpeech processing systemsen_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.accessRightsopen accessen_US

Files in This Item:
File Description SizeFormat 
6711.pdfFor All Users3.59 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12290