Keywords: Speech recognition database, telephone services
Project description
Due to the progress reached in speech processing technology more and
more
powerful voice driven teleservices can be implemented, which allow easy
access
to: information services (train table information), transaction
services
(home shopping), call processing services (voice mail handling) via the
telenetwork.
To implement these language specific spoken language resources, i.e.
speech
databases, lexica and related tools are needed. In order to be
competitive
with American companies starting with a large monolingual market the
consortium
of SpeechDat will lay the ground for European companies to be
competitive
when starting with a multilingual environment. The project aims at
producing
speech databases realising a large coverage of languages and
applications.
The main features are: coverage of applications (application-oriented
words,
phonetically rich sentences, spontaneous utterances, speaker
verification),
coverage of the 11 official European languages and variants, coverage
of
speaking styles (commands, carefully pronounced and spontaneous
speech),
coverage of environmental influences (mobile and fixed telephone
network).
Around 5000 speakers will be recorded for the official languages over
the
fixed network, while there will be 1000 speakers for the language
variants,
the mobile recordings (5 languages) and the speaker verification
recordings
(3 languages). For validation and distribution of the data bases the
European
Language Resource Association (ELRA) will be involved in the project.
The
following languages (and variants) will be covered: Danish, Dutch,
Flemish,
British English, Welsh, Finnish, French, Belgian French, Swiss French,
Luxembourgish
French, German, Swiss German, Luxembourgish German, Greek, Italian,
Portuguese, Slovenian, Spanish, Swedish and Finnish Swedish.