CS553 / CS653 - Speech Synthesis
2006 Winter

Instructors

Esther Klabbers
mailto:klabbers@cslu.ogi.edu
http://www.bme.ogi.edu/~klabbers/

Alexander Kain
mailto:kain@cslu.ogi.edu
http://cslu.bme.ogi.edu/~kain/

Student Responsibilities

Assignments

There will be several code-writing assignments. Please comment your code, and provide transcripts of example runs and figures of results, if possible. Submit your code and documentation as a single archive file (.tar, .tgz, .bz2, .zip) by email.

Participation

Participate in discussions and ask questions.

Paper Review and Presentation

To enhance your presentation skills, there will be an opportunity to review and present a relevant paper from the field. The length of the presentation should be about 10 minutes, and be prepared for up to 5 minutes of Q&A following your talk.

Exams

There will be no midterm or final exams.

Syllabus

#

Date

Instructor

Topic

Assignment

References

Presentations

Presenter

1

01/10

Esther

Introduction / Class setup / History of TTS / experience expectations of students





2

01/12

Esther

Utterance structure / Review spectrograms/ waveform/ wavesurfer


Taylor & Black



3

01/17

Esther

Tokenization (addresses, abbreviations, disambiguation)

#1: tokenization

Richard Sproat, Alan Black, Stanley Chen, Shankar Kumar, Mari Ostendorf, and Christopher Richards. "Normalization of non-standard words." Computer Speech and Language, 15(3), 287-333, 2001.



4

01/19

Esther

Word Pronunciation (dictionary, letter-to-sound methods – WFST / rules / HMM)

#2: letter to sound rules

A. Black, K. Lenzo, & V. Pagel , “Issues in building general letter to sound rules”,  Proceedings SSW3, Jenolan Caves, Australia



5

01/24

Esther

Word Syllabification


G. Kiraz & B. Moebius, “Multilingual syllabification using weighted finite-state transducers”

T. Borowski, “Structure preservation and the syllable coda in English”

A. van den Bosch and W. Daelemans - Data oriented methods for grapheme-to-phoneme conversion

Ken Anderson

6

01/26

Esther

Phrase


C. Oliveira, L. Moutinho, A. Teixeira – On European Portuguese Automatic Syllabification

G. Kiraz & B. Moebius – Multilingual syllabification using weighted finite-state transducers

Nathan Bodenstab

7

01/30

Esther

Word Emphasis


J. Hirschberg & P. Prieto (1994) – Training intonational phrasing rules automatically for English and Spanish text-to-speech, Proc. 2nd ESCA/IEEE Workshop on Speech Synthesis, New Paltz, NY (hardcopy)

A. Black & P. Taylor (1997) – Assigning phrase breaks from part-of-speech sequences, Proc. EUROSPEECH’97, Rhodes, Greece


Tanarat Dityam

8

02/02

Esther

Duration 1





9

02/07

Esther

Duration 2

#3: duration




10

02/09

Esther

Intonation 1



R. Baker, R. Clark, M. White (2004) - Synthesising contextually appropriate intonation in limited domains, 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA

Adam Murakami

11

02/14

Esther

Intonation 2

#4: intonation


A. Raux & A. Black (2003) – A unit selection approach to F0 modeling and its application to emphasis, ASRU 2003, St Thomas, US Virgin Is

Chinten Shah

12

02/16

Alex

Text Selection and Recording

#5: text-selection, due 02/28




13

02/21

Alex

Unit Search


Hunt96

Bernd Möbius, “Rare Events and Closed Domains: Two Delicate Concepts in Speech Synthesis”, International Journal of Speech Technology, vol. 6, no.1, pp. 57--71, 2003.

Tomek Szegalowski

14

02/23

Alex

Joining Units and Pitch-Synchronous Overlap-Add





15

02/28

Alex

Review of Discrete Time Signal Processing

#6: PSOLA implementation, due 03/07




16

03/02

Alex

Linear Shift-Invariant Filters



Paul Taylor and Alan W Black (1999). Speech Synthesis by Phonological Structure Matching, in Eurospeech99

Emily Tucker

17

03/07

Alex

Formant Synthesis





18

03/09

Alex

Linear Prediction of Speech





19

03/14

Alex

Evaluation





20

03/16

Alex

Research Directions


van Santen: Synthesis of Prosody using Multi-level Unit Sequences


Bruce White