Best practices for speech and multimodal databases

CLARIN and FLaReNet workshop at KTH Stockholm 25 and 26 Nov 2009

Our research community is large and varied, and there is a great need to define the requirements for infrastructures which can be fruitfully and easily shared. The annotation of interesting corpora in this field is, however, divergent and would benefit from finding ways for harmonization, interoperability etc. Thus, we feel that it is of importance to look into standards and best practices to encode various relevant features of these corpora to facilitate their use for a wide diversity of researchers. The features may be intonation, facial expressions, gestures, turn-taking, emotions, etc.

In addition we would like to have a general discussion about the best way to get the speech community more actively involved in the issues discussed within the CLARIN project. The speech researchers have already been breaking new ground for spoken data collection and distribution under the LDC and ELRA umbrellas. A driving force has been the need for large databases for automatic speech recognition training. However, the problem to facilitate efficient and robust access to a multitude of spoken corpora for general research within the humanities is far from solved.

We would like to hear about your experiences, needs and potential ideas for standardization within the subject of the workshop.

Rolf Carlson, Kjell Elenius and David House