The Effect of Human Prosody on Comprehension of TTS Robot Speech
  • Last updated on 27th Sep 2023

A conference paper published at the International Symposium on Robot and Human Interactive Communication (RO-MAN) 2023.

DOI: 10.1109/RO-MAN57019.2023.10309415 | Full text

🔗Abstract

The ability to interact verbally with humans is a key requirement of many social robots. It is common however for robot speech to lack contextual human-like prosody, making it intelligible but seeming inexpressive and cold. We investigated the effect that applying human-like prosody to synthetic speech had on aural comprehension during human-robot interaction. A text-to-speech system was used to generate synthetic sentences in two conditions: "default", and "human" (informed by voice actor). A speech-in-noise experiment was then performed that required participants to transcribe perceived sentences spoken by a robot in both test conditions. Overall, we found no significant difference in comprehension between sentences spoken using the synthetic voice with prosody and the unaltered synthetic voice, however significant differences in comprehension were detected for shorter sentences (n=52), and among participants that learned English in a different country to the native dialect of the voice actor (n=26). In both of these cases, participants found the voice with human-like prosody harder to comprehend. These findings suggest that introducing human-like prosody to synthetic speech in human-robot interaction, under certain circumstances, may lead to the voice becoming less intelligible. This motivates further research and adds to the growing body of literature on the multifaceted role that voice plays in human-robot interaction.

🔗Conference presentation