KTH, Stockholm, Sweden, 6-8 June, 2012
Keynote speakers
|
Lewis Johnson
Chief scientist at Alelo Inc
|
Using Speech Technology to Model Spoken Communication Skills
Abstract:
Using spoken language technology as a foundation, it is now possible to assess a full range of spoken language skills, from pronunciation skill to conversational skills and sociocultural pragmatics. This talk will describe a speech-based approach to assessing a spectrum of language skills, implemented in Alelo's suite of learning products. The resulting language learner models are being used to track progress to skill mastery, optimize sequences of language learning activities, and predict and prevent language skill decay.
Lewis Johnson is one of the founders of Alelo Inc, one of the world leaders in the use of virtual-world simulations of real-life social interactions to help people become more effective communicators, across languages and cultures worldwide.
Dr Johnson was the principal investigator of the Tactical Language project at the Center for Advanced Research in Technology for Education (CARTE), the Information Sciences Institute of the University of Southern California.
This work won DARPA's Significant Technical Achievement Award in 2005.
Since 2007, he is working full time at Alelo, but continues to be active in research focusing on the successful adoption of interactive learning environments.
His article on the Politeness Effect in pedagogical agents, co-authored with Richard Mayer and Ning Wang, won the Most Cited Prize in 2011 from the International Journal of Human-Computer Studies.
Dr. Johnson has served on the governing boards and organizing committees of multiple international societies and conferences, is past president of the International Artificial Intelligence in Education Society, and past chair of the ACM Special Interest Group for Artificial Intelligence. He holds a B.A. in linguistics from Princeton University and a Ph.D. in computer science from Yale University.
More information
|
Bryan Pellom
Vice President, Speech Development at Rosetta Stone
|
Rosetta Stone ReFLEX: Toward Improving English Conversational Fluency in Asia
Abstract:
Despite considerable spend in terms of time and money, Korean and Japanese language learners struggle to communicate effectively in English. There are several possible factors for this lack of success, ranging from overemphasis of existing curricula on nonspeaking skills to cultural factors that hinder the learning process. This paper describes Rosetta Stone ReFLEX, a novel online solution specifically designed to address the shortcomings of more traditional learning methods and improve conversational fluency. Unlike traditional methods, Rosetta Stone ReFLEX engages learners in an adaptive, 30-minute daily program that combines games and other activities that practice sound skills, simulated conversational narratives that rely on speech recognition, and one-on-one live human interaction. This paper describes the typical pronunciation error patterns made by Korean learners of English, summarizes the Rosetta Stone ReFLEX solution, and describes several of the underlying speech technologies associated with delivering the online solution. Finally, practical areas for future research as well as existing challenges are discussed.
Bryan Pellom has been an active researcher in the speech technology field for more than 15 years. He received his Ph.D. in Electrical Engineering from Duke University and was previously a research faculty member at the University of Colorado. Bryan's main areas of specialty include multilingual speech recognition, spoken dialog systems, and interactive speech technologies for children. In 2002 he was the technical chair of the International Conference on Spoken Language Processing. Currently he is Vice President of Speech Development at Rosetta Stone.
Rosetta Stone provides interactive solutions for language learning available in more than 30 languages. Rosetta Stone language-learning solutions are used by schools, organizations and millions of individuals in over 150 countries throughout the world. The company was founded in 1992 on the core beliefs that learning a language should be natural and instinctive and that interactive technology can replicate and activate the immersion method powerfully for learners of any age.
More information
|
Horacio Franco
Program Director, Speech Technology & Research Laboratory (STAR), SRI International
|
Adaptive and Discriminative Modeling for Improved Mispronunciation Detection
Horacio Franco, Luciana Ferrer, and Harry Bratt, Speech Technology and Research Laboratory, SRI International.
Abstract: In this talk we present an overview of SRI's research on automatic pronunciation assessment with emphasis on phone-level measures. In the context of computer aided language learning, automatic detection of specific phone mispronunciations by nonnative speakers can be used to provide detailed feedback for a student to work on specific pronunciation problems when producing the new sounds of a foreign language. Starting with an initial approach based on a measure of match to native models, we found that significant improvements could be achieved by explicitly modeling both mispronunciations and correct pronunciations by nonnative speakers. This approach has been recently extended based on the use of adaptation and discriminative modeling, showing significant improvements from our previous best system. Performance of the proposed approaches was evaluated in a phonetically transcribed database of 130,000 phones uttered in continuous speech sentences by 206 nonnative speakers.
Horacio Franco is program director of the STAR laboratory at SRI International, which is an independent, nonprofit research institute conducting client-sponsored research and development for government agencies, commercial businesses, foundations, and other organizations.
The STAR Laboratory is recognized as a world-leading speech technology organization, working with technology creation and transfer in areas such as signal processing, phonetics/phonology, mathematical modeling, and software engineering.
In the 1990s, SRI spun off market leader Nuance Communications to exploit technology developed in the STAR Laboratory.
In the scope of the symposium, SRI STAR has developed the EduSpeak speech recognition system for computer learning and training applications such as foreign language education, English as a second language (ESL), reading development and interactive tutoring, and corporate training and simulation.
Horacio Franco has a Doctoral degree in Engineering from the University of Buenos Aires, Argentina.
His research interests are Speech Recognition, Speech Processing, Speech Technology for Language Learning, Connectionist Models for Speech Recognition, Education
More information
|
Gary Pelton
Vice President, Product Development, Carnegie Speech
|
Mining pronunciation data for Consonant cluster problems
Abstract:
This paper describes using data collected from users of NativeAccentTM, to decide if these users had pronunciation problems within consonant clusters that warranted special exercises and lessons. Consonant clusters are a known issue for English learners, partly because many other languages either do not have consonant clusters or because they have different sets of clusters. In addition English as a Second Language (ESL) teachers notice patterns of problems with consonant clusters in their students, that are related to the student's second language. We look at 3000 NativeAccent users. These users' background includes eight different native languages. We look for evidence that consonant cluster problems are being detected, and for patterns that depend upon the user's native language. We find evidence that the users are seeing cluster problems and that some of the problems are the same across the 8 languages and some problems are more prominent in a few of the languages. We also discuss the implications of this data for the intelligent tutoring system within NativeAccent
Gary Pelton is VP at the Carnegie Speech Company, a leading developer of software for assessing and teaching spoken language skills. Using state-of-the-art speech recognition and artificial intelligence technologies licensed from Carnegie Mellon University, Carnegie Speech enables cost-effective, scalable and personalized spoken language instruction that maximizes training effectiveness and minimizes training time. With years of linguistic research, world-class technology and language tutoring expertise, Carnegie Speech provides language training products to a diverse and global clientele.
Gary is responsible for managing and directing the company's team of software development professionals to ensure reliability, functionality and scalability of Carnegie Speech's language learning products. In addition to Carnegie Speech, Gary has led product development teams for a variety of organizations, from Bell Labs to entrepreneurial start-ups. He received his MS in Computer Science and a BS in Physics from Virginia Polytechnic Institute and State University.
More information
|
Helmer Strik
Centre for Language and Speech Technology (CLST), Radboud University Nijmegen
|
Automatic pronunciation error detection for a-typical speech
Abstract:
Automatic pronunciation error detection (PED) is useful for pronunciation assessment, diagnosis, training and therapy. It can be applied in developing automatic feedback systems for language learners and for people with communicative disabilities. Both second language speech and pathological speech deviate in various ways from 'standard' native speech, which makes automatic PED of these kinds of a-typical speech a challenging task. Although improved algorithms for PED have been developed, they still have a number of limitations. The challenge is then how to successfully employ these PED algorithms for the purpose of speech assessment, diagnosis, training and therapy. An overview of PED research at our lab and related work will be provided.
Helmer Strik has carried out research, and published more than 150 papers on various speech related topics, e.g. pronunciation (modelling), automatic speech recognition (ASR), and ASR for computer assisted language learning (CALL). He is a member of the International Speech Communication Association Special Interest Group (ISCA SIG) on Speech and Language Technology in Education (SLaTE), and a guest editor of a special issue of the Speech Communication journal on SLaTE.
In collaboration with Cathia Cucchiarini he organized the SLaTE 2011 workshop (Speech and Language Technology in Education), a satellite of Interspeech-2011 in Florence. He holds a PhD from the Department of Language and Speech of the University of Nijmegen.
More information
|
Jack Mostow
Research Professor, Carnegie Mellon University
|
Why and How Our Automated Reading Tutor Listens
Abstract:
Project LISTEN's Reading Tutor listens to children read aloud, and helps them learn to read. This paper outlines how it gives feedback, how it uses ASR, and how we measure its accuracy. It describes how we model various aspects of oral reading, some ideas we tried, and lessons we have learned about acoustic models, lexical models, confidence scores, language models, alignment methods, and prosodic models.
Jack Mostow's research interests in artificial intelligence have included speech, machine learning, and design.
He holds a A.B. cum laude in Applied Mathematics from Harvard University and a Ph.D. in Computer Science from Carnegie Mellon University.
After research and faculty positions at Stanford, Information Sciences Institute, and Rutgers, he joined the Carnegie Mellon faculty in 1992 to launch Project LISTEN, which is getting computers to listen to children read aloud, and help them.
Dr. Mostow has been the chair and co-chair of several conferences on artificial intelligence and intelligent tutoring systems and has served as an editor of Machine Learning Journal and IEEE Transactions on Software Engineering.
In 2003, Dr. Mostow was awarded The Allen Newell Medal for Research Excellence and in 2010, he was elected President of the International Artificial Intelligence in Education Society.
More information
|
Silke Witt-Ehsani
Vice President, Speech Solutions, Fluential Inc
|
Automatic Pronunciation Error Detection: A Review of the current State-of-the-art
Abstract:
This presentation gives a review of the large amount of research on automatic pronunciation error detection that has been conducted over the past 10-15 years. The goal is to provide a linkage between the various research approaches and work streams in order to aid development of the next generation of algorithms. A vision of an ideal pronunciation error detection system is presented and used as a reference to determine current challenges and possible next steps in research efforts. Lastly, an extensive list of references on the field is provided.
Silke Witt-Ehsani is working on building the next generation multi-modal dialog systems at Fluential Inc.
She has over 13 years of research and industrial experience in the creation and application of speech and natural language technology in call center environments, ranging from speech recognition algorithm development to the design of numerous leading-edge speech dialog systems.
She has previously headed TuVox Design Center, and its multidisciplinary approach to designing speech applications; and also worked at SRI Consulting on their very first SLM deployments.
Silke holds a Ph.D in speech recognition from Cambridge University focused on the use of Speech Recognition in Computer-Assisted Language Learning.
Her articles, co-written with Steve Young, on the use of ASR in computer-assisted language learning from the late 1990's remain some of the most cited articles on the topic.
More information
|
Florian Hoenig
Researcher in the Speech Processing and Understanding group at the Pattern Recognition Lab of the Friedrich-Alexander University Erlangen-Nuremberg
|
The Automatic Assessment of Prosody - Methods for Evaluation and Exemplars for Local and Global Models
Abstract:
The first part of the presentation deals with general considerations on the evaluation of both human raters and automatic systems, employed for pronunciation assessment: How can we come closer to an unbiased, realistic estimate of their reliability, given the fallibility of human annotators and the nature of machine-learning algorithms (and researchers) that adapt, and inevitably overfit to a given training set. In the second part, we will present concrete models for assessment: The detection of (erroneous) word accent placement as an example for an individual, local error, and the assessment of the overall rhythmic quality of the learner's speech as an example for a more global phenomenon. The methods are evaluated in detail on English and German data from the German research project C-AuDiT.
Florian Hoenig received his diploma degree in Computer Science in 2005 at Friedrich-Alexander University Erlangen, Germany, with a thesis done at ITC-irst (now Fondazione Bruno Kessler) in Trento, Italy, on the acoustic frontend of ASR. Since then, he has been a member of the research staff of the Pattern Recognition Lab at FAU. Within the European Network of Excellence HUMAINE, he worked on data collection/annotation, feature extraction and automatic classification for the real-time detection of the affective user state from physiological signals. In the German research projects C-AuDiT (Computer-aided Pronunciation and Dialogue Training) and AUWL (Automatic Web-based Learner-Feedback-System), he employs suprasegmental information for the automatic assessment of non-native pronunciation.
More information
|