CentLex is the central lexicon resource produced and maintained by the Centre for Speech Technology.
The CentLex lexicon resource currently contains slightly more than 400,000 entries. Each CentLex entry consists of a full form orthographic word and a grammatical analysis. Associated with each entry is a list of phonological pronunciation representations, sorted by their estimated frequency of occurrence and various other information.
Group: Speech Communication and Technology
Duration: 2000 - 2007
Speech Communication, 50(2), 126-141. (2007). Spoken language annotation and data-driven modelling of phone-level pronunciation in discourse context.
Proceedings of the LREC workshop on merging and layering linguistic information (pp. 1-8). Genua, Italy. [pdf] (2006). Integrating Linguistic Information from Multiple Sources in Lexicon Development and Spoken Language Annotation. In
Modelling Phone-Level Pronunciation in Discourse Context. Doctoral dissertation. [pdf] (2006).
Proceedings of Fonetik (pp. 7-9). Lund, Sweden. [pdf] (2006). Modelling Pronunciation in Discourse Context. In
Proceedings of Fonetik (pp. 25-27). Göteborg, Sweden. [pdf] (2005). Annotating Speech Data for Pronunciation Variation Modelling. In
Proceedings of Interspeech (pp. 4-8). Lisbon, Portugal. [pdf] (2005). Inducing Decision Tree Pronunciation Variation Models from Annotated Speech Data. In
Proceedings of Fonetik (pp. 12-15). Stockholm, Sweden. [pdf]
(2004). Pronunciation variation modelling using decision tree induction from multiple linguistic parameters. In