CentLex is the central lexicon resource produced and maintained by the Centre for Speech Technology.

The CentLex lexicon resource currently contains slightly more than 400,000 entries. Each CentLex entry consists of a full form orthographic word and a grammatical analysis. Associated with each entry is a list of phonological pronunciation representations, sorted by their estimated frequency of occurrence and various other information.

Group: Speech Communication and Technology

Per-Anders Jande
Jens Edlund
Kjell Gustafson
Rolf Carlson

Duration: 2000 - 2007

Related publications:


Jande, P. (2007). Spoken language annotation and data-driven modelling of phone-level pronunciation in discourse context. Speech Communication, 50(2), 126-141.


Jande, P-A. (2006). Integrating Linguistic Information from Multiple Sources in Lexicon Development and Spoken Language Annotation. In Proceedings of the LREC workshop on merging and layering linguistic information (pp. 1-8). Genua, Italy. [pdf]

Jande, P-A. (2006). Modelling Phone-Level Pronunciation in Discourse Context. Doctoral dissertation. [pdf]

Jande, P-A. (2006). Modelling Pronunciation in Discourse Context. In Proceedings of Fonetik (pp. 7-9). Lund, Sweden. [pdf]


Jande, P-A. (2005). Annotating Speech Data for Pronunciation Variation Modelling. In Proceedings of Fonetik (pp. 25-27). Göteborg, Sweden. [pdf]

Jande, P-A. (2005). Inducing Decision Tree Pronunciation Variation Models from Annotated Speech Data. In Proceedings of Interspeech (pp. 4-8). Lisbon, Portugal. [pdf]


Jande, P-A. (2004). Pronunciation variation modelling using decision tree induction from multiple linguistic parameters. In Proceedings of Fonetik (pp. 12-15). Stockholm, Sweden. [pdf]

Published by: TMH, Speech, Music and Hearing

Last updated: 2012-11-09