TTS

Audio samples for "Emotion-Coherent Speech Data Augmentation and Self-Supervised Contrastive Style Training for Enhancing Kids's Story Speech Synthesis"

Audio samples

Author: Raymond Chung

Kid-storybook Page-level Storytelling utterances

	Ground truth utterances	M1: 1-sentence utterances	M2: 1-sentence utterances + 2-sentence utterances	M4: 1-sentence utterances + 2-sentence utterances of augmented data + contrastive loss
Sample 1 "Build strong, safe houses!" So they packed their bags and waved goodbye. "Watch out for the Big Bad Wolf," called Mother Pig. "We will!"
Sample 2 ...they met a man selling straw. "Can I buy some straw?" asked Pinky Pig. "I'm going to build a house."
Sample 3 The next day, the Big Bad Wolf went to the straw house. "Little pig, little pig, let me come in," he called. "No!" cried Pinky Pig. "Not by the hair on my chinny-chin-chin."

Longer example

This one-minute kid storytelling speech was generated by the prosposed TTS model (M4) in one step and with each sentence conditioned by its predicted speaking style.

The story is from this page.