Author: Raymond Chung
| Ground truth utterances | M1: 1-sentence utterances | M2: 1-sentence utterances + 2-sentence utterances | M4: 1-sentence utterances + 2-sentence utterances of augmented data + contrastive loss |
|
|---|---|---|---|---|
| Sample 1 "Build strong, safe houses!" So they packed their bags and waved goodbye. "Watch out for the Big Bad Wolf," called Mother Pig. "We will!" |
||||
| Sample 2 ...they met a man selling straw. "Can I buy some straw?" asked Pinky Pig. "I'm going to build a house." |
||||
| Sample 3 The next day, the Big Bad Wolf went to the straw house. "Little pig, little pig, let me come in," he called. "No!" cried Pinky Pig. "Not by the hair on my chinny-chin-chin." |
This one-minute kid storytelling speech was generated by the prosposed TTS model (M4) in one step and with each sentence conditioned by its predicted speaking style.
The story is from this page.