Nordic Graduate School of Language Technology
Back
to Speech Technology 1 home page
A selection of papers and other publications will be used as additional reading material for each subtopic.
Most papers can be found on the web. Some papers will be distributed.
Acoustic Phonetics
Liljencrants, J.:" Speech signal processing," in W Hardcastle & J Laver (editors) The Handbook of Phonetic
Sciences, Blackwell Publishers Ltd, Oxford 1997, 697-720 (Will be distributed)
Yates, G.: "The ear as an acoustical
transducer", Acoustics Australia, Vol. 21 1993, pp. 77-81(Will be
distributed)
Lieberman, P., Blumstein, S. (1988): parts of
chapter 7 Speech physiology, speech perception, and acoustic phonetics,
Cambridge University Press, pp. 148-161(Will be distributed)
Bruce, G., B. Granström, K. Gustafson, M. Horne, D. House, and P. Touati. 1997.‘On the analysis of prosody in interaction.’ In Y. Sagisaka, N. Campbell and N. Higuchi (eds.) Computing
Prosody: Computational Models for Processing Spontaneous Speech, 43-59,
Hirschberg J., Communication and prosody: Functional
aspects of prosody , Speech Communication, Volume 36, Issues 1-2, January 2002,
Pages 31-43 (Will be distributed)
Speech Synthesis
Carlson R., Granström
B.: "Speech Synthesis", Hardcastle &
Laver (editors) The Handbook of Phonetic Sciences, Blackwell Publishers Ltd,
Granström, B. "Multi-modal speech synthesis with
applications" G. Chollet, M. G. Di Benedetto, A. Esposito, M. Marinaro,
(Eds) Speech Processing, Recognition and Artificial
Neural Network, Proceedings of the 3rd International School on Neural Nets
"Eduardo R. Caianiello" Springer London
1999, pp. 327-346 (Will be distributed)
Klatt D.: "Review of text-to-speech conversion for
English", Journal of the Acoustical Society of America Vol.82 s 737-793,
1987 http://www.mindspring.com/~ssshp/ssshp_cd/dk_737a.htm
van Santen, J. , When will synthetic speech sound human:
Role of rules and data, In Proc of ICSLP 2000,
W. Black, P. Taylor, and Caley
R. The Festival Speech Synthesis System, 1998. http://www.cstr.ed.ac.uk/projects/festival/
Corpus-Based Techniques In
The At&T Nextgen
Synthesis System, Icslp 2000,
http://www.research.att.com/projects/tts/pubs.html
Synthesis examples:
http://www.ims.uni-stuttgart.de/~moehler/synthspeech/examples.html
http://www.naturalvoices.att.com/
http://www.acapela-group.com/demos/demos.asp
Speech
Recognition
Lawrence
R. Rabiner (1989) A Tutorial on Hidden Markov Models and
Selected Applications in Speech Recognition, Proceedings of the IEEE, vol 77, no. 2, pp. 257-286. http://www.caip.rutgers.edu/%7Elrr/Reprints/tutorial
on hmm and applications.pdf
S. Young
(1996). "Large Vocabulary Continuous Speech
Recognition." IEEE Signal Processing Magazine 13(5): 45-57. http://mi.eng.cam.ac.uk/~sjy/papers/youn96.ps.gz
Ronald
Rosenfeld (2000) Two decades of Statistical
Language Modeling: Where Do We Go From Here? Proceedings
of the IEEE, 88(8), (pdf)
Ingunn Amdal, Eric Fossler-Lussier
(2003) "Pronunciation variation modeling in
automatic speech recognition", Telektronikk,
vol. 99, no. 2 http://www.telenor.com/telektronikk/volumes/pdf/2.2003/Side_70-82.pdf
R.P. Lippman (1997) Speech recognition by machines and humans, Speech
Communication vol 22 no 1, pp 1-15 (pdf)
M Mohri, F Pereira, M Riley (2000) Weighted finite state
transducers in speech recognition, ISCA ITRW ASR2000,
Speaker
Verification
Gish, H.
and Schmidt, M. (1994): "Text-independent speaker identification", IEEE
Signal Processing Magazine Oct. 94, pp. 18-32 (pdf)
S. Furui (1997): "Recent Advances in Speaker
Recognition", Pattern Recognition Letters, vol
18, pp 859-872. (pdf)
Douglas A.
Reynolds, Thomas F. Quatieri, Robert B. Dunn (2000):
"Speaker verification using adapted Gaussian mixture models", Digital
Signal Processing, vol. 10, no. 1-3, Jan-July 2000 (pdf)
Bimbot, F., Bonastre, J.-F., Fredouille,
C., Gravier, G., Magrin-Chagnolleau,
I., Meignier, S., Merlin, T., Ortega-García, J., Petrovska-Delacrétaz,
D., and Reynolds, D. (2004): "A Tutorial on Text-Independent Speaker
Verification", EURASIP Journal on Applied Signal Processing, Hindawi Publishing Corporation Vol. 2004, no 4, pp 432-451 (pdf)
Dialog Systems
James Allen, Donna Byron, Myroslava Dzikovska, George Ferguson, Lucian Galescu,
and Amanda Stent, "Towards conversational
human-computer interaction," AI Magazine, 22(4), Winter
2001, pp. 27-37. http://www.cs.rochester.edu/research/trips/
Joakim Gustafson (2002). Developing
multimodal spoken dialogue systems. Empirical studies of spoken human-computer
interaction. Doctoral Thesis. Department of Speech,
Music and Hearing, KTH,
Harald Aust, Martin Oerder, Frank Seide, Volker Steinbiss: The Philips
automatic train timetable information system, Speech Communication 17 (1995)
(Will be distributed)
Chu-Carroll, J., "MIMIC: An adaptive
mixed initiative spoken dialogue system for information queries," in
Proceedings of the 6th ACL Conference on Applied Language Processing, (
http://acl.ldc.upenn.edu/A/A00/A00-1014.pdf
Jim Glass, "Challenges for Spoken Dialogue Systems," Proc. 1999 IEEE ASRU Workshop, Keystone, CO, December 1999. http://www.sls.csail.mit.edu/sls/publications/
Marilyn A. Walker, Candace
A. Kamm, and Diane J. Litman. Towards Developing General
Models of Usability with
Levin, Pieraccini,
Eckert, (2000) A Stochastic Model of Human-Machine Interaction for Learning
Dialog Strategies IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 1, JANUARY 2000 11 http://www-cs.ccny.cuny.edu/~esther/papers/StochasticModelOfHumanmMachineInteractions.pdf
Some web pages on spoken dialogue systems
http://www.cs.cmu.edu/~dbohus/SDS/index.html
http://wwwhome.cs.utwente.nl/~schooten/vidiam/dialoguesystems/
http://www.cs.cmu.edu/~dgroup/
http://www.cs.cmu.edu/~dod/roundtable/
Applications of Speech Technology
Pieraccini and Huerta (2005) Where do
we go from here? Research and Commercial spoken dialogue systems http://www.sigdial.org/workshops/workshop6/proceedings/pdf/65-SigDial2005_8.pdf
Gilbert, Wilpon,
Stern and di Fabbrizio Intelligent Virtual
agents for Contact Automation http://www.difabbrizio.com/papers/Intelligent-virtual-agents-for-contact-center-automation-01511822.pdf
Cole et al (2003) Perceptive Animated
Interfaces: First Steps Toward a New Paradigm for
Human–Computer Interaction http://cslr.colorado.edu/beginweb/publications/journal/pellom-ieee-hci-2003.pdf
Oberteuffer John A. (2005) Speech Technologies Make Video Games
Complete, ( search Oberteuffer
on http://www.speechtechmag.com )
Reference Literature (Not part of the
course)
Acoustic phonetics, Kenneth
N. Stevens. ISBN 0-262-19404-X
Allmän och svensk fonetik. Norstedts. Elert, Claes-Christian. 1995.
Handbook of Phonetic Sciences (Ed WJ Hardcastle and J Laver) Blackwell,
Spoken Language Processing: A Guide to Theory,
Algorithm and System Development, Huang & Xuedong ISBN: 0-13-022616-5
Survey of the State of the Art Human Language
Technology http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html
Speech Technology Magazine's NewsBlast http://www.speechtechmag.com/eletter/archives/
CTT - Selection of conferences/workshops http://www.speech.kth.se/conferences/