Graduate School of Language Technology

Nordic Graduate School of Language Technology

Speech Technology Level 1 (2007)

Additional papers

A selection of papers and other publications will be used as additional reading material for each subtopic.

Most papers can be found on the web. Some papers will be distributed.

Acoustic Phonetics

Liljencrants, J.:" Speech signal processing," in W Hardcastle & J Laver (editors) The Handbook of Phonetic Sciences, Blackwell Publishers Ltd, Oxford 1997, 697-720 (Will be distributed)

Yates, G.: "The ear as an acoustical transducer", Acoustics Australia, Vol. 21 1993, pp. 77-81(Will be distributed)

Lieberman, P., Blumstein, S. (1988): parts of chapter 7 Speech physiology, speech perception, and acoustic phonetics, Cambridge University Press, pp. 148-161(Will be distributed)

Bruce, G., B. Granström, K. Gustafson, M. Horne, D. House, and P. Touati. 1997.‘On the analysis of prosody in interaction.’ In Y. Sagisaka, N. Campbell and N. Higuchi (eds.) Computing Prosody: Computational Models for Processing Spontaneous Speech, 43-59, Springer-Verlag, New York. (Will be distributed)

Hirschberg J., Communication and prosody: Functional aspects of prosody , Speech Communication, Volume 36, Issues 1-2, January 2002, Pages 31-43 (Will be distributed)

Speech Synthesis

Carlson R., Granström B.: "Speech Synthesis", Hardcastle & Laver (editors) The Handbook of Phonetic Sciences, Blackwell Publishers Ltd, Oxford 1997, 768-788 (Will be distributed)

Granström, B. "Multi-modal speech synthesis with applications" G. Chollet, M. G. Di Benedetto, A. Esposito, M. Marinaro, (Eds) Speech Processing, Recognition and Artificial Neural Network, Proceedings of the 3rd International School on Neural Nets "Eduardo R. Caianiello" Springer London 1999, pp. 327-346 (Will be distributed)

Klatt D.: "Review of text-to-speech conversion for English", Journal of the Acoustical Society of America Vol.82 s 737-793, 1987 http://www.mindspring.com/~ssshp/ssshp_cd/dk_737a.htm

van Santen, J. , When will synthetic speech sound human: Role of rules and data, In Proc of ICSLP 2000, Beijing. (Will be distributed)

W. Black, P. Taylor, and Caley R. The Festival Speech Synthesis System, 1998. http://www.cstr.ed.ac.uk/projects/festival/

Corpus-Based Techniques In The At&T Nextgen Synthesis System, Icslp 2000, Beijing, China, October 2000
http://www.research.att.com/projects/tts/pubs.html

Synthesis examples:
http://www.ims.uni-stuttgart.de/~moehler/synthspeech/examples.html
http://www.naturalvoices.att.com/
http://www.acapela-group.com/demos/demos.asp

Speech Recognition

Lawrence R. Rabiner (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, vol 77, no. 2, pp. 257-286. http://www.caip.rutgers.edu/%7Elrr/Reprints/tutorial on hmm and applications.pdf

S. Young (1996). "Large Vocabulary Continuous Speech Recognition." IEEE Signal Processing Magazine 13(5): 45-57. http://mi.eng.cam.ac.uk/~sjy/papers/youn96.ps.gz

Ronald Rosenfeld (2000) Two decades of Statistical Language Modeling: Where Do We Go From Here? Proceedings of the IEEE, 88(8), (pdf)

Ingunn Amdal, Eric Fossler-Lussier (2003) "Pronunciation variation modeling in automatic speech recognition", Telektronikk, vol. 99, no. 2 http://www.telenor.com/telektronikk/volumes/pdf/2.2003/Side_70-82.pdf

R.P. Lippman (1997) Speech recognition by machines and humans, Speech Communication vol 22 no 1, pp 1-15 (pdf)

M Mohri, F Pereira, M Riley (2000) Weighted finite state transducers in speech recognition, ISCA ITRW ASR2000, Paris http://www.cs.nyu.edu/~mohri/postscript/asr2000.ps

Speaker Verification

Gish, H. and Schmidt, M. (1994): "Text-independent speaker identification", IEEE Signal Processing Magazine Oct. 94, pp. 18-32 (pdf)

S. Furui (1997): "Recent Advances in Speaker Recognition", Pattern Recognition Letters, vol 18, pp 859-872. (pdf)

Douglas A. Reynolds, Thomas F. Quatieri, Robert B. Dunn (2000): "Speaker verification using adapted Gaussian mixture models", Digital Signal Processing, vol. 10, no. 1-3, Jan-July 2000 (pdf)

Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-García, J., Petrovska-Delacrétaz, D., and Reynolds, D. (2004): "A Tutorial on Text-Independent Speaker Verification", EURASIP Journal on Applied Signal Processing, Hindawi Publishing Corporation Vol. 2004, no 4, pp 432-451 (pdf)

Dialog Systems

James Allen, Donna Byron, Myroslava Dzikovska, George Ferguson, Lucian Galescu, and Amanda Stent, "Towards conversational human-computer interaction," AI Magazine, 22(4), Winter 2001, pp. 27-37. http://www.cs.rochester.edu/research/trips/

Joakim Gustafson (2002). Developing multimodal spoken dialogue systems. Empirical studies of spoken human-computer interaction. Doctoral Thesis. Department of Speech, Music and Hearing, KTH, Stockholm Chapter 2 and 3 http://www.speech.kth.se/ctt/publications/

Harald Aust, Martin Oerder, Frank Seide, Volker Steinbiss: The Philips automatic train timetable information system, Speech Communication 17 (1995) (Will be distributed)

Chu-Carroll, J., "MIMIC: An adaptive mixed initiative spoken dialogue system for information queries," in Proceedings of the 6th ACL Conference on Applied Language Processing, (Seattle, WA, USA), May 2000.
http://acl.ldc.upenn.edu/A/A00/A00-1014.pdf

Jim Glass, "Challenges for Spoken Dialogue Systems," Proc. 1999 IEEE ASRU Workshop, Keystone, CO, December 1999. http://www.sls.csail.mit.edu/sls/publications/

Marilyn A. Walker, Candace A. Kamm, and Diane J. Litman. Towards Developing General Models of Usability with PARADISE. In Natural Language Engineering, http://www.dcs.shef.ac.uk/~walker/paradise.html

Levin, Pieraccini, Eckert, (2000) A Stochastic Model of Human-Machine Interaction for Learning Dialog Strategies IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 1, JANUARY 2000 11 http://www-cs.ccny.cuny.edu/~esther/papers/StochasticModelOfHumanmMachineInteractions.pdf

Some web pages on spoken dialogue systems

http://www.cs.cmu.edu/~dbohus/SDS/index.html

http://wwwhome.cs.utwente.nl/~schooten/vidiam/dialoguesystems/

http://www.cs.cmu.edu/~dgroup/

http://www.cs.cmu.edu/~dod/roundtable/

Applications of Speech Technology

Pieraccini and Huerta (2005) Where do we go from here? Research and Commercial spoken dialogue systems http://www.sigdial.org/workshops/workshop6/proceedings/pdf/65-SigDial2005_8.pdf

Gilbert, Wilpon, Stern and di Fabbrizio Intelligent Virtual agents for Contact Automation http://www.difabbrizio.com/papers/Intelligent-virtual-agents-for-contact-center-automation-01511822.pdf

Cole et al (2003) Perceptive Animated Interfaces: First Steps Toward a New Paradigm for Human–Computer Interaction http://cslr.colorado.edu/beginweb/publications/journal/pellom-ieee-hci-2003.pdf

Oberteuffer John A. (2005) Speech Technologies Make Video Games Complete, ( search Oberteuffer on http://www.speechtechmag.com )

Reference Literature (Not part of the course)

Acoustic phonetics, Kenneth N. Stevens. ISBN 0-262-19404-X

Allmän och svensk fonetik. Norstedts. Elert, Claes-Christian. 1995.

Fundamentals of Speech Recognition, Lawrence Rabiner & Biing-Hwang Juang, 1993, PTR Prentice Hall, ISBN 0130151572

Handbook of Phonetic Sciences (Ed WJ Hardcastle and J Laver) Blackwell, Oxford ISBN 0- 631-18848-7

Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Second Edition, Jurafsky and Martin. 2007 http://www.cs.colorado.edu/~martin/slp2.html

Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Huang & Xuedong ISBN: 0-13-022616-5

Survey of the State of the Art Human Language Technology http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html

Text-to-Speech Synthesis, Taylor P. 2007 http://mi.eng.cam.ac.uk/~pat40/book.html

Some more links

Speech Technology Magazine's NewsBlast http://www.speechtechmag.com/eletter/archives/

CTT - Selection of conferences/workshops http://www.speech.kth.se/conferences/