FDT3317 Speech Synthesis from Beginning to End-to-end
KTH Royal Institute of Technology
Admitted to a doctoral education programme.
“Machines that speak” is an age-old topic that has experienced a recent surge in research interest. Speaking devices are now in everyone's pockets, and the speech-synthesis field has become a challenging proving ground for new methods in machine learning.
This course is an introduction to text-to-speech (TTS) synthesis with elements of acoustic phonetics and signal processing. The course introduces a universal TTS engineering pipeline step by step: text processing, prediction engine, and waveform generation. The pipeline components are then explored within each contemporary speech-synthesis paradigm, from unit selection via statistical-parametric and hybrid synthesisers to end-to-end systems.
After having completed the course, the students should be able to:
1. Demonstrate a solid knowledge basis for doing independent research and development of state-of-the-art text-to-speech synthesis.
2. Define and motivate basic concepts in TTS-relevant acoustic phonetics and signal processing, and describe all parts of the text-to-speech pipeline.
3. Using the above understanding as a basis, acquire and demonstrate skills in system implementation, as practiced and evaluated during exercise sessions.
4. Demonstrate good familiarity with the seminal advances in speech synthesis over the years (both at KTH and at large), as well as with the most recent achievements such as neural-network-based end-to-end systems.
Reviews
Improve accuracy by rating this course