2011TMH-QPSR, 51(1), 89-92. [pdf]
(2011). Detecting confusable phoneme pairs for Swedish language learners depending on their first language. Proceedings of SLaTE. Venice, Italy.
(2011). Using an Ensemble of Classifiers for Mispronunciation Feedback. In Strik, H., Delmonte, R., & Russel, M. (Eds.), Proceedings of SLaTE. [pdf]
(2011). Dealing with L1 background and L2 dialects in Norwegian CAPT. In proceedings of ICPhS. [pdf]
(2011). L1-L2map: a tool for multi-lingual contrastive analysis. In The Virtual Language Teacher: Models and applications for language learning using embodied conversational agents. Doctoral dissertation, KTH School of Computer Science and Communication. [abstract] [pdf]
(2011). Abstract: This thesis presents a framework for computer assisted language learning using a virtual language teacher. It is an attempt at creating, not only a new type of language learning software, but also a server-based application that collects large amounts of speech material for future research purposes.
The motivation for the framework is to create a research platform for computer assisted language learning, and computer assisted pronunciation training.
Within the thesis, different feedback strategies and pronunciation error detectors are explored
This is a broad, interdisciplinary approach, combining research from a number of scientific disciplines, such as speech-technology, game studies, cognitive science, phonetics, phonology, and second-language acquisition and teaching methodologies.
The thesis discusses the paradigm both from a top-down point of view, where a number of functionally separate but interacting units are presented as part of a proposed architecture, and bottom-up by demonstrating and testing an implementation of the framework.TMH-QPSR, 51(1), 49-52. [pdf]
(2011). Contrastive analysis through L1-L2map. 2010International Conference on Auditory-Visual Speech Processing. Kanagawa, Japan. [abstract] [pdf]
(2010). Detection of Specific Mispronunciations using Audiovisual Features. In Abstract: This paper introduces a general approach for binary
classification of audiovisual data. The intended application is
mispronunciation detection for specific phonemic errors, using
very sparse training data. The system uses a Support Vector
Machine (SVM) classifier with features obtained from a Time
Varying Discrete Cosine Transform (TV-DCT) on the audio
log-spectrum as well as on the image sequences. The
concatenated feature vectors from both the modalities were
reduced to a very small subset using a combination of feature
selection methods. We achieved 95-100% correct
classification for each pair-wise classifier on a database of
Swedish vowels with an average of 58 instances per vowel for
training. The performance was largely unaffected when tested
on data from a speaker who was not included in the training.Proceedings of Second Language Studies: Acquisition, Learning, Education and Technology. Waseda University, Tokyo, Japan. [abstract] [pdf]
(2010). Simicry - A mimicry-feedback loop for second language learning. In Abstract: This paper introduces the concept of Simicry, defined as similarity
of mimicry, for the purpose of second language acquisition.
We apply this method using a computer assisted language
learning system called Ville on foreign students learning
Swedish. The system deploys acoustic similarity measures between
native and non-native pronunciation, derived from duration
syllabicity and pitch. The system uses these measures to
give pronunciation feedback in a mimicry-feedback loop exercise
which has two variants: a ’say after me’ mimicry exercise,
and a ’shadow with me’ exercise.
The answers of questionnaires filled out by students after
several training sessions spread over a month, show that the
learning and practicing procedure has a promising potential being
very useful and fun.2009Proceedings of Interspeech. [pdf]
(2009). Are real tongue movements easier to speech read than synthesized?. In Proceedings of AVSP. [pdf]
(2009). Can you tell if tongue movements are real or synthetic?. In Proceedings of Fonetik 2009.
(2009). Real vs. rule-generated tongue movements as an audio-visual speech perception support. In Proc. of SLaTE Workshop on Speech and Language Technology in Education. Wroxall, England. [pdf]
(2009). Say ‘Aaaaa’ Interactive Vowel Practice for Second Language Learning. In Proc. of SLaTE Workshop on Speech and Language Technology in Education. Wroxall, England. [abstract] [pdf]
(2009). Responses to Ville: A virtual language teacher for Swedish. In Abstract: A series of novel capabilities have been designed to extend the repertoire of Ville, a virtual language teacher for Swedish, created at the Centre for Speech technology at KTH. These capabilities were tested by twenty-seven language students at KTH. This paper reports on qualitative surveys and quantitative performance from these sessions which suggest some general lessons for automated language training.Speech Communication, 51(10), 1024-1037. [abstract] [pdf]
(2009). Embodied conversational agents in computer assisted language learning. Abstract: This paper describes two systems using embodied conversational agents (ECAs) for language learning. The first system, called Ville, is a virtual language teacher for vocabulary and pronunciation training. The second system, a dialogue system called DEAL, is a role-playing game for practicing conversational skills. Whereas DEAL acts as a conversational partner with the objective of creating and keeping an interesting dialogue, Ville takes the role of a teacher who guides, encourages and gives feedback to the students.2008Technology and Disability, 20(2), 97-107. [pdf]
(2008). Visualization of speech and audio for hearing-impaired persons. Proceedings of Interspeech 2008 (pp. 2627-2630). Brisbane, Australia. [pdf]
(2008). Can visualization of internal articulators support speech perception?. In Proceedings of Fonetik 2008. [pdf]
(2008). Looking at tongues – can it help in speech perception?. In 2007Proceedings of ACM Future Play 2007 (pp. 137-144). [abstract] [pdf]
(2007). DEAL – Dialogue Management in SCXML for Believable Game Characters. In Abstract: In order for game characters to be believable, they must appear to possess qualities such as emotions, the ability to learn and adapt as well as being able to communicate in natural language. With this paper we aim to contribute to the development of believable non-player characters (NPCs) in games, by presenting a method for managing NPC dialogues. We have selected the trade scenario as an example setting since it offers a well-known and limited domain common in games that support ownership, such as role-playing games. We have developed a dialogue manager in State Chart XML, a newly introduced W3C standard, as part of DEAL -- a research platform for exploring the challenges and potential benefits of combining elements from computer games, dialogue systems and language learning.Proceedings of SIGdial (pp. 132-135). Antwerp, Belgium. [abstract] [pdf]
(2007). Dealing with DEAL: a dialogue system for conversation training. In Abstract: We present DEAL, a spoken dialogue system for conversation training under development at KTH.
DEAL is a game with a spoken language interface designed for second language learners. The system is intended as a multidisciplinary research platform where challenges and potential benefits of combining elements from computer games, dialogue systems and language learning can be explored.Från Vision till praktik, språkutbildning och informationsteknik (pp. 51-70). Nätuniversitetet. [pdf]
(2007). Att lära sig språk med en virtuell lärare. In Proceedings of Fonetik, TMH-QPSR, 50(1), 57-60. [pdf]
(2007). Computer Assisted Conversation Training for Second Language Learners. Proceedings of SLATE 2007. [abstract] [pdf]
(2007). DEAL A Serious Game For CALL Practicing Conversational Skills In The Trade Domain. In Abstract: This paper describes work in progress on DEAL, a spoken dialogue system under development at KTH. It is intended as a platform for exploring the challenges and potential benefits of combining elements from computer games, dialogue systems and language learning.2005Proceedings from the Second Nordic Conference on Multimodal Communication. [PDF]
(2005). Artificial Gaze - Perception experiment of eye gaze in synthetic faces. In 2004Proc ICSLP 2004 (pp. 1693-1696). Jeju Island, Korea. (21 citations) [pdf]
(2004). Design strategies for a virtual language tutor. In Kim, S. H., & Young, D. H. (Eds.), Proc of The XVIIth Swedish Phonetics Conference, Fonetik 2004 (pp. 136-139). Stockholm University. [pdf]
(2004). Designing a virtual language tutor. In Proceedings of the Eleventh EURALEX International Congress. Lorient, France. [pdf]
(2004). Managing complex and multilingual lexical data with a simple editor. In 200317th International Conference of the European Cetacean Society. Las Palmas, Canary Islands.
(2003). A Cognitive Science Approach To A Human-Dolphin Dialog Protocol. In First International Conference on Acoustic Communication by Animals. Maryland, USA.
(2003). Building Common Ground: Communication Across Species Barriers. In 2002Building Bridges: A Cognitive Science Approach To A Human-Dolphin Dialog Protocol. Master's thesis, University of Oslo. [pdf]