Publications of Mats Blomberg
Other Publications
2011Blomberg, M. (2011). Model space size scaling for speaker adaptation. TMH-QPSR, 51(1), 77-80. [pdf]2010Elenius, D., & Blomberg, M. (2010). Dynamic vocal tract length normalization in speech recognition. In Working Papers 54, Proceedings from Fonetik 2010 (pp. 29-34). Centre for Languages and Literature, Lund University, Sweden. [pdf]2009Blomberg, M., & Elenius, D. (2009). Estimating speaker characteristics for speech recognition. In Proceedings of Fonetik 2009. Dept. of Linguistics, Stockholm University. [pdf]Blomberg, M., & Elenius, D. (2009). Tree-based Estimation of Speaker Characteristics for Speech Recognition. In Proceedings of Interspeech 2009. [abstract]Abstract: Speaker adaptation by means of adjustment of speaker
characteristic properties, such as vocal tract length, has the
important advantage compared to conventional adaptation
techniques that the adapted models are guaranteed to be
realistic if the description of the properties are. One problem
with this approach is that the search procedure to estimate
them is computationally heavy. We address the problem by
using a multi-dimensional, hierarchical tree of acoustic model
sets. The leaf sets are created by transforming a
conventionally trained model set using leaf-specific speaker
profile vectors. The model sets of non-leaf nodes are formed
by merging the models of their child nodes, using a
computationally efficient algorithm. During recognition, a
maximum likelihood criterion is followed to traverse the tree.
Studies of one- (VTLN) and four-dimensional speaker profile
vectors (VTLN, two spectral slope parameters and model
variance scaling) exhibit a reduction of the computational load
to a fraction compared to that of an exhaustive search. In
recognition experiments on children’s connected digits using
adult and male models, the one-dimensional tree search
performed as well as the exhaustive search. Further reduction
was achieved with four dimensions. The best recognition
results are 0.93% and 10.2% WER in TIDIGITS and PF-Star-
Sw, respectively, using adult models.Blomberg, M., Elenius, K., House, D., & Karlsson, I. (2009). Research Challenges in Speech Technology: A Special Issue in Honour of Rolf Carlson and Bjorn Granstrom. Speech Communication, 51(7), 563.Elenius, D., & Blomberg, M. (2009). On Extending VTLN to Phoneme-specific Warping in Automatic Speech Recognition. In Proceedings of Fonetik 2009. Dept. of Linguistics, Stockholm University. [pdf]Neiberg, D., Ananthakrishnan, G., & Blomberg, M. (2009). On Acquiring Speech Production Knowledge from Articulatory Measurements for Phoneme Recognition. In INTERSPEECH 2009 - 10th Annual Conference of the International Speech Communication Association (pp. 1387 – 1390). Brighton, UK. [abstract] [pdf]Abstract: The paper proposes a general version of a coupled Hidden
Markov/Bayesian Network model for performing phoneme
recognition on acoustic-articulatory data. The model uses
knowledge learned from the articulatory measurements, available for training, for phoneme recognition on the acoustic input. After training on the articulatory data, the model is able to predict 71.5% of the articulatory state sequences using the acoustic input. Using optimized parameters, the proposed method shows a slight improvement for two speakers over the baseline
phoneme recognition system which does not use articulatory knowledge. However, the improvement is only statistically significant for one of the speakers. While there is an improvement in recognition accuracy for the vowels, diphthongs and to some extent the semi-vowels, there is a decrease in accuracy for the remaining phonemes.2008Blomberg, M., & Elenius, D. (2008). Investigating Explicit Model Transformations for Speaker Normalization. In Proceedings of ISCA ITRW Speech Analysis and Processing for Knowledge Discovery. Aalborg, Denmark. [abstract] [pdf]Abstract: In this work we extend the test utterance adaptation technique
used in vocal tract length normalization to a larger number of
speaker characteristic features. We perform partially joint
estimation of four features: the VTLN warping factor, the
corner position of the piece-wise linear warping function,
spectral tilt in voiced segments, and model variance scaling.
In experiments on the Swedish PF-Star children database,
joint estimation of warping factor and variance scaling
lowered the recognition error rate compared to warping factor
alone.Blomberg, M., & Elenius, D. (2008). Knowledge-Rich Model Transformations for Speaker
Knowledge-Rich Model Transformations for Speaker Normalization in Speech Recognition. In Proceedings, FONETIK 2008, Department of Linguistics, University of Gothenburg. [abstract] [pdf]Abstract: In this work we extend the test utterance adaptation
technique used in vocal tract length normalization
to a larger number of speaker characteristic
features. We perform partially joint
estimation of four features: the VTLN warping
factor, the corner position of the piece-wise linear
warping function, spectral tilt in voiced
segments, and model variance scaling. In experiments
on the Swedish PF-Star children database,
joint estimation of warping factor and
variance scaling lowers the recognition error
rate compared to warping factor alone.Blomberg, M., & Elenius, D. (2008). Knowledge-Rich Model Transformations for Speaker Normalization in Speech Recognition. In Eriksson, A., & Lindh, J. (Eds.), Fonetik 2008. Box 200, SE 405 30 Gothenburg. [pdf]2007Blomberg, M., & Elenius, D. (2007). Vocal tract length compensation in the signal and model domains in child speech recognition. Proceedings of Fonetik, TMH-QPSR, 50(1), 41-44. [pdf]2005Batliner, A., Blomberg, M., D’Arcy, S., Elenius, D., Giuliani, D., Gerosa, M., Hacker, C., Russell, M., Steidl, S., & Wong, M. (2005). The PF STAR Children’s Speech Corpus. In Proc Interspeech 2005. [pdf]Elenius, D., & Blomberg, M. (2005). Adaptation and Normalization Experiments in Speech Recognition for 4 to 8 Year old Children. In Proc Interspeech 2005. [pdf]Oppelstrup, L., Blomberg, M., & Elenius, D. (2005). Scoring Children's Foreign Language Pronunciation. In Proc FONETIK 2005. Department of Linguistics, Göteborg University. [pdf]2004Blomberg, M., Elenius, D., & Zetterholm, E. (2004). Speaker verification scores and acoustic analysis of a professional impersonator. In Proc of The XVIIth Swedish Phonetics Conference, Fonetik 2004 (pp. 84-87). Stockholm University. [pdf]Elenius, D., & Blomberg, M. (2004). Comparing speech recognition for adults and children. In Proc of The XVIIth Swedish Phonetics Conference, Fonetik 2004 (pp. 156-159). Stockholm University. [pdf]Zetterholm, E., Blomberg, M., & Elenius, D. (2004). A comparison between human perception and a speaker verification system score of a voice imitation. In Proc of Tenth Australian International Conference on Speech Science & Technology (pp. 393-397). Macquarie Univ, Sydney, Australia. [pdf]2003Blomberg, M., & Elenius, D. (2003). Collection and recognition of children s speech in the PF-Star project. In Proc of Fonetik 200,3 Umeå University, Dept of Philosophy and Linguistics PHONUM 9 (pp. 81-84). [pdf]2002Blomberg, M. (2002). Phoneme recognition for the hearing impaired. Proceedings of Fonetik, TMH-QPSR, 44(1), 109-112. [pdf]Elenius, D., & Blomberg, M. (2002). Characteristics of a low reject mode speaker verification system. In Proc of ICSLP 2002 (pp. 1385-1388). Denver, Colorado, USA. [pdf]Johansson, M., Blomberg, M., Elenius, K., Hoffsten, L-E., & Torberger, A. (2002). A phoneme recognizer for the hearing impaired. In Proc. of ICSLP'2002 (pp. 433-436). Denver, Colorado, USA. [pdf]Johansson, M., Blomberg, M., Elenius, K., Hoffsten, L-E., & Torberger, A. (2002). Phoneme recognition for the hearing impaired. In Proc. of Fonetik 2002 (pp. 109-112). Stockholm.2000Bimbot, F., Blomberg, M., Boves, L., Genoud, D., Hutter, H-P., Jaboulet, C., Koolwaaij, J., Lindberg, J., & Pierrot, J-B. (2000). An overwiev of the CAVE project research activities in speaker verification. Speech Comm, 31, 155-180.Lindberg, J., & Blomberg, M. (2000). On the potential threat of using large speech corpora for impostor selection in speaker verification. In Yuan, B., Huang, T., & Tang, X. (Eds.), Proc. of ICSLP 2000, 6th Intl Conf on Spoken Language Processing (pp. 258-261). Beijing.Magnuson, T., & Blomberg, M. (2000). Acoustic analysis of dysarthric speech. In Botinis, A., & Torstensson, N. (Eds.), The Swedish Phonetics Conf (pp. 105-108). Skövde.Magnuson, T., & Blomberg, M. (2000). Acoustic analysis of dysarthric speech and some implications for automatic speech recognition. TMH-QPSR, 41(1), 019-030. [pdf]Rosengren, E., Magnuson, T., Hunnicutt, S., & Blomberg, M. (2000). Analysis of dysarthric speech for use with speech recognition. In Proc of ISAAC«00, 9th Biennal Conf of the Intl Society for Augmentative and Alternative Communication (pp. 64-66). Washington, DC, USA.1999Bimbot, F., Blomberg, M., Boves, L., Chollet, G., Jaboulet, C., Jacob, B., Kharroubi, J., Koolwaaij, J., Lindberg, J., Mariethoz, J., Mokbel, C., & Mokbel, H. (1999). An overview of the PICASSO project research activities in speaker verification for telephone applications. In Proc of Eurospeech 99 (pp. 1963-1967). [pdf]Blomberg, M. (1999). Within-utterance correlation for speech recognition. In Proc of Eurospeech 99 (pp. 2479-2482). Blomberg, M. (1999). Within-utterance correlation in automatic speech recognition.. In Proc of Fonetik 99 (pp. 23-26). Lindberg, J., & Blomberg, M. (1999). Vulnerability in speaker verification. A study of technical impostor techniques. In Proc of Eurospeech 99 (pp. 1211-1214). [pdf]1998Blomberg, M. (1998). Speech recognition using long-distance relations in an utterance. In Branderud, P., & Traunmüller, H. (Eds.), Proc of Fonetik -98, The Swedish Phonetics Conference (pp. 166). Lindberg, J., Koolwaaij, J., Hutter, H-P., Genoud, D., Pierrot, J-B., Blomberg, M., & Bimbot, F. (1998). Techniques for a priori decision threshold estimation in speaker verification. In Proc of RLA2C, La Reconnaissance du Locuteur et ses Applications Commerciales et Criminalistiques (Speaker Recognition and its Commercial and Forensic Applications) (pp. 89-92). Avignon, France. [pdf]Pierrot, J-B., Lindberg, J., Koolwaaij, J., Hutter, H-P., Genoud, D., Blomberg, M., & Bimbot, F. (1998). A comparison of a priori threshold setting procedures for speaker verification in the CAVE project. In Proc of ICASSP98, Intl Conference on Acoustics, Speech and Signal Processing (pp. 125-128). Seattle, Wash.1997Blomberg, M. (1997). Creating unseen triphones by phone concatenation in the spectral, cepstral and formant domains. In Proc of Fonetik -97, Dept of Phonetics, Umeå Univ., Phonum 4 (pp. 41-44). Blomberg, M. (1997). Creating unseen triphones by phone concatenation of diphones and monophones in the spectral, cepstral and formant domains. In Kokkinakis, G., Fakotakis, N., & Dermatas, E. (Eds.), Proc of Eurospeech Õ97, 5th European Conference on Speech Communication and Technology (pp. 1187-1190). Rhodes, Greece. [ps]Lindberg, J., Blomberg, M., & Melin, H. (1997). CAVE - Speaker verification in bank and telecom services. In Bannert, R., Heldner, M., Sullivan, K., & Wretling, P. (Eds.), Proc of Fonetik -97, Dept of Phonetics, Umeå Univ., Phonum 4 (pp. 65-68). [ps]1996Blomberg, M. (1996). Creation of unseen triphones from seen triphones, diphones and phones. TMH-QPSR, 37(2), 113-116. [pdf]Blomberg, M., & Elenius, K. (1996). Creation of unseen triphones from diphones and monophones using a speech production approach. In Proc of ICSLP-96, 4th Intl Conference on Spoken Language Processing (pp. 2316-2319). Philadelphia, USA. [pdf]Blomberg, M., & Elenius, K. O. E. (1996). Creation of unseen triphones from diphones and monophones using a speech production approach. TMH-QPSR, 37(3), 023-028. [pdf]1995Bertenstam, J., Beskow, J., Blomberg, M., Carlson, R., Elenius, K., Granström, B., Gustafson, J., Hunnicutt, S., Högberg, J., Lindell, R., Neovius, L., Nord, L., de Serpa-Leitao, A., & Ström, N. (1995). The Waxholm system - a progress report. In Dalsgaard, P. (Ed.), Proc of ESCA Workshop on Spoken Dialogue Systems (pp. 281-284). Vigs¿, Denmark. [pdf]Bertenstam, J., Blomberg, M., Carlson, R., Elenius, K., Granström, B., Gustafson, J., Hunnicutt, S., Högberg, J., Lindell, R., Neovius, L., de Serpa-Leitao, A., Nord, L., & Ström, N. (1995). The Waxholm application data-base. In Pardo, J. (Ed.), Proceednings Eurospeech 1995 (pp. 833-836). Madrid. [pdf]Bertenstam, J., Blomberg, M., Carlson, R., Elenius, K. O. E., Granström, B., Gustafson, J., Hunnicutt, S., Högberg, J., Lindell, R., Neovius, L., Nord, L., de Serpa-Leitao, A., & Ström, N. (1995). Spoken dialogue data collected in the Waxholm project. STL-QPSR, 36(1), 049-074. [pdf]1994Blomberg, M. (1994). A common phone model representation for speech recognition and synthesis. In Proc. ICSLP '94 (pp. 1875-1878). Yokohama, Japan.Blomberg, M. (1994). Towards production-oriented techniques for speech recognition. Doctoral dissertation.Blomberg, M. (1994). Towards production-oriented techniques for speech recognition. STL-QPSR, 35(4), 029-062. [pdf]Blomberg, M. (1994). Training production parameters of context-dependent phones for speech recognition. STL-QPSR, 35(1), 059-090. [pdf]Blomberg, M. (1994). Training speech synthesis parameters of allophones for speech recognition. In FONETIK «94, Working papers from the 8th Swedish Phonetics Conference (pp. 18-21). Lund, Sweden.Blomberg, M., Elenius, K., & Ström, N. (1994). Speech recognition in the Waxholm dialog system. In FONETIK '94, Working papers from the 8th Swedish Phonetics Conference (pp. 22-23). Lund, Sweden. [pdf]1993Blomberg, M. (1993). Synthetic phoneme prototypes and dynamic voice source adaptation in speech recognition. STL-QPSR, 34(4), 097-140. [pdf]Blomberg, M., & Carlson, R. (1993). Labeling of speech given its text representation. In Eurospeech '93, Berlin (pp. 1775-1778). Blomberg, M., Carlson, R., Elenius, K., Granström, B., Hunnicutt, S., Lindell, R., & Neovius, L. (1993). An experimental dialog system: WAXHOLM. In Proceedings of Seventh Swedish Phonetics Conference, RUUL 23 (pp. 49-52). Uppsala.Blomberg, M., Carlson, R., Elenius, K., Gustafson, J., Granström, B., Hunnicutt, S., Lindell, R., & Neovius, L. (1993). An experimental dialogue system: WAXHOLM. In Proceedings Eurospeech '93 (pp. 1867-1870). Berlin. [pdf]Blomberg, M., Carlson, R., Elenius, K. O. E., Granström, B., Gustafson, J., Hunnicutt, S., Lindell, R., Neovius, L., & Nord, L. (1993). An experimental dialogue system: Waxholm. STL-QPSR, 34(2-3), 015-020. [pdf]Elenius, K. O. E. ., & Blomberg, M. (1993). Experiments with artificial neural networks for phoneme and word recognit. STL-QPSR, 34(1), 047-056. [pdf]Nordebo, S., Bengtsson, B., Claesson, I., Nordholm, S., Roxström, A., Blomberg, M., & Elenius, K. (1993). Noise Reduction Using an Adaptive Microphone Array for Speech Recognition in a Car. In RVK -93. 1992Blomberg, M. (1992). Continuous speech recognition using synthetic word and triphone prototypes. In Huber, D. (Ed.), Fonetik '92, the Sixth Swedish Phonetics Conference held in Gothenburg, Technical Report no. 11 (pp. 19-22). Chalmers University of Technology, Göteborg.Blomberg, M., & Elenius, K. (1992). Speech recognition using artificial neural networks and dynamic programming. In Huber, D. (Ed.), Fonetik '92, the Sixth Swedish Phonetics Conference held in Gothenburg, Technical Report no. 12 (pp. 57). Chalmers University of Technology, Göteborg.Elenius, K., & Blomberg, M. (1992). Comparing phoneme and feature based speech recognition using artificial neural networks. In Ohala, J. J., Nearey, T. M., Derwing, B. L., Hodge, M. M., & Wiebe, G. E. (Eds.), Proceedings ICSLP 92 (pp. 1279-1282). Banff, Canada. [pdf]Elenius, K., & Blomberg, M. (1992). Experiments with Artificial Neural Networks for Phoneme and Word recognition. In Proceedings of the First Swedish Conference on Connectionism (pp. 263-272). Skövde, Sweden. [pdf]1990Blomberg, M., & Elenius, K. O. E. (1990). Optimizing some parameters of a word recognizer used in car noise. STL-QPSR, 31(4), 043-052. [pdf]1989Blomberg, M. (1989). Synthetic phoneme prototypes and source adaptation in a speech recognition system. STL-QPSR, 30(1), 131-135. [pdf]Blomberg, M., & Elenius, K. (1989). Testing some essential parameters of a word recogniser used in car noise. In Proceedings of ESCA Workshop on Speech Input/Output Assessment and Speech Databases (pp. 6.5.1-6.5.4). Noordwijkerhout.Crosnier, S., Blomberg, M., & Elenius, K. (1989). Speech Recogniser Sensitivity to the Variation of Different Control Parameters in Synthetic Speech. In Proceedings of ESCA Workshop on Speech Input/Output Assessment and Speech Databases (pp. 6.5.1-6.5.4). Noordwijkerhout.1988Blomberg, M., Carlson, R., Elenius, K., Granström, B., & Hunnicutt, S. (1988). Word recognition using synthesized reference templates. In Proc. Second Symposium on Advanced Man-Machine Interface Through Spoken Language (pp. 27-1 - 27-12). Hawaii, USA.Blomberg, M., Carlson, R., Elenius, K., Granström, B., & Hunnicutt, S. (1988). Word recognition using synthesized templates. In Proceedings of SPEECH '88, (pp. 1171-1178). Edinburgh.Blomberg, M., Carlson, R., Elenius, K. O. E., Granström, B., & Hunnicutt, S. (1988). Word recognition using synthesized templates. STL-QPSR, 29(2-3), 069-081. [pdf]1987Blomberg, M., Carlson, R., Elenius, K., Granström, B., & Hunnicutt, S. (1987). Taligenkänning baserad på ett text-till-talsystem. In Proceedings of TLH-Lund (pp. 18-19). Lund.Blomberg, M., Carlson, R., Elenius, K., Granström, B., Hunnicutt, S., Lindell, R., & Neovius, L. (1987). Speech recognition based on a text-to-speech synthesis system. In Laver, J., & Jack, M. A. (Eds.), European Conference on Speech Technology, Vol. II (pp. 369-372). Edinburgh.1986Blomberg, M., Carlson, R., Elenius, K., & Granström, B. (1986). Auditory models as front ends in speech-recognition systems. In Perkell, J. S., & Klatt, D. H. (Eds.), Invariance and Variability in Speech Processes,. Cambridge, MA, USA: Lawrence Erlbaum Ass. and Hillsdale, NJ.Blomberg, M., Carlson, R., Elenius, K., Granström, B., & Hunnicutt, S. (1986). Some current projects at KTH related to speech recognition. In International Workshop on Recent Advances and Applications of Speech Recognition. Rome.Blomberg, M., Carlson, R., Elenius, K. O. E., Galyas, K., Granström, B., Hunnicutt, S., & Neovius, L. (1986). Speech synthesis and recognition in technical aids. STL-QPSR, 27(4), 045-056. [pdf]Blomberg, M., Carlson, R., Elenius, K. O. E., Granström, B., & Hunnicutt, S. (1986). Some current projects at KTH related to speech recognition. STL-QPSR, 27(1), 031-040. [pdf]Blomberg, M., & Elenius, K. (1986). Nonlinear Frequency Warp for Speech Recognition. In Proc. ICASSP 86, Vol. 4 (pp. 2631-2634). Tokyo.1985Blomberg, M., Carlson, R., Elenius, K., & Granström, B. (1985). Speech research at KTH - two projects and technology transfer. In Forsberg, H. G., & Peterson, A. (Eds.), Proc. Speech-based information systems, May 1984 (pp. 57-66). Stockholm: IVA.Blomberg, M., & Elenius, K. O. E. (1985). Automatic time alignment of speech with a phonetic transcription. STL-QPSR, 26(1), 037-045. [pdf]1984Blomberg, M., Carlson, R., Elenius, K., & Granström, B. (1984). Auditory models in isolated word recognition. In Proceedings ICASSP 84. San Diego.Magnusson, L., Blomberg, M., Carlson, R., Elenius, K., & Granström, B. (1984). Swedish Speech Researchers Team Up with Electronic Ventrure Capitalists. Speech Technology, 2(2), 15-24.1983Blomberg, M., Carlson, R., Elenius, K. O. E., & Granström, B. (1983). Auditory models and isolated word recognition. STL-QPSR, 24(4), 001-015. [pdf]Blomberg, M., Elenius, K., Lundin, F., & Sundmalm, C. (1983). Let your voice do the dialing. Telephony, 68-74.Lundin, F., Blomberg, M., & Elenius, K. (1983). Voice controlled dialing in an intercom system. In Proceedings of Voice Data Entry Systems Applications Conference. Chicago.1982Blomberg, M., Carlson, R., Elenius, K., & Granström, B. (1982). Experiments with auditory models in speech recognition. In Carlson, R., & Granström, B. (Eds.), The Representation of Speech in the Peripheral Auditory System (pp. 197-201). Amsterdam: Elsevier Biomedical.Blomberg, M., & Elenius, K. (1982). A device for automatic speech recognition. In NAS, Nordiska Akustiska Sällskapet Föredrag (pp. 383-386). Stockholm. [pdf]Blomberg, M., & Elenius, K. (1982). Effects of emphasizing transitional or stationary parts of the speech signal in a discrete utterance recognition system. In Proc. ICASSP 82 (pp. 535-538). Paris.1979Watanabe, A., Felicetti, S., Hedström, B., Surjadi, G., Tannergård, G., Tegerstedt, I., Wejnebring, B., Wetterling, M-B., Andersson, L., Hallsten, L., Kaunisto, M., Murray, T., Eriksson, H., Haapakorpi, M., Karlsson, I., Nord, L., Stålhammar, U., Elenius, K., Blomberg, M., Liljencrants, J., Carlson, R., Granström, B., Risberg, A., Spens, K-E., Agelförs, E., Boberg, G., Mártony, J., Tunblad, T., Öster, A-M., Galyas, K., Gauffin, J., de Serpa-Leitão, A., Askenfelt, A., Jansson, E., & Sunberg, J. (1979). Gunnar Fant 60 years. TMH-QPSR, 20(2), 1-45. [pdf]1970Blomberg, M., & Elenius, K. (1970). Statistisk analys av talsignaler. Master's thesis, KTH, TMH.Blomberg, M., & Elenius, K. O. E. (1970). Statistical analysis of speech signals. STL-QPSR, 11(4), 001-008. [pdf]