The COST250 Speaker Recognition Reference System

H. Melin, A.M. Ariyaeeinia, M. Falcone

Version: 1.0, Date: 20/12/1999

Here is a brief history of changes (including new features) from previous versions.

The most recent version of this documentation can be found at http://www.speech.kth.se/cost250/refsys/latest/doc.


Table of contents


1. Introduction

The present Reference System is the result of work within Working Group (WG) 4 of
COST 250 [1]. The topic of WG4 is Assessment and Dissemination.

The main purpose for researchers to agree on a common reference system is to increase comparability between experiment results produced at different sites, with different recognition systems and perhaps with different databases [2]. If a common recognition system is available at all sites and can be used with any database, then comparing results produced with this system can help understanding differences and similarities between results that are otherwise difficult to compare. The presented speaker recognition system is intended to become such a reference system. It is small, portable, can be shared by everybody, and it can easily be applied to any recognition task with any speech data.

Once a recognition task has been defined and data has been recorded, three components are needed to run an experiment:

This Reference System currently contains the two first components, and a preliminary implementation of the third. The recognition engine contains an LPC cepstrum feature extractor and a VQ classifier [3], and uses a non-client world model for score normalization [4]. The "machinery" takes a list of operations (an experiment definition file) as input and executes the operations in the recognition engine. The output from the two first components of the Reference System is a score file. The system is built in a modular fashion with three classes of objects: a database class, a recognizer class and an experiment class. While, currently, classes have been created for interfacing the reference recognition engine and the Polycost database, new classes could easily be created to interface other engines and databases.

The score file from the first two components should be processed by the third component, a scoring component, to extract performance figures and/or visualize performance. A preliminary specification and implementation of a scoring component has recently been added to the Reference System. This includes the calculation of a test set EER and a test set ROC curve based on speaker-independent thresholds, as defined in the EAGLES handbook [2]. The ROC data can be used as input to some DET-plotting software, which is not yet included in the Reference System.

This is the documentation to release 1.0 of the COST250 Speaker Recognition Reference System. The documentation describes how to install the system on a unix platform and how to run a simple calibration test to verify that the installation is correct. It further describes the operation of the Reference System and how it can be used.

Comments, suggestions or questions about the Reference System may be sent to melin@speech.kth.se.

2. Portability Issues

The Reference System is coded partly in ANSI-C and partly in the Tcl scripting language. The actual speaker recognition system parts, such as the feature extractor (LPC cepstrum) and the classifier (VQ), are coded in ANSI-C while system-level parts are coded in Tcl.

The ANSI-C source codes should compile and execute equivalently on any platform. Tcl interpreters exist for most platforms and can be downloaded for free. Hence, all program code should be portable to most platforms. So far they have been tested successfully on the following types of unix systems:

It has also been preliminary tested on Windows platforms (95/98/NT). However, the installation of the system on a Windows-based system is not covered by this documentation. The installation notes are written with a unix system in mind. Especially, the use of a make program and a makefile is not likely to be directly useful on a non-unix system. If you try to install the system on a Windows machine, please feed back notes to the authors so we can improve the documentation.

A calibration test is provided with the distribution. This calibration test should be run first thing after installation to make sure the system produces the same results on the same experiment at all sites. So far, identical score output files have been produced on all tested unix systems for this calibration test.

3. Future Extensions

One of the reasons for choosing Tcl for implementing the system-level parts of the Reference System is the potential for future extensions. Given the modular structure of the system, where the recognition engine is a well contained object, interactive speaker recognition applications could easily be created with the addition of other Tcl extension packages, such as Snack for audio extensions and Tk for graphical user interfaces.

Continue to installation instructions.


References

[1] Falcone M. (1999). "COST250 Working Group 4: Speaker Recognition Assessment and Dissemination", In: COST250 Final Report.

[2] Bimbot F., Chollet G. (1997). "Assessment of speaker verification systems", In: Handbook of Standards and Resources for Spoken Language Systems, Gibbon D., Moore R., Winski R. (Eds.), Mouton de Gruyter, ISBN 3-11-015366-1.

[3] Ariyaeeinia A., Sivakumaran P. (1997). "Comparison of VQ and DTW Classifiers for Speaker Verification", Proc. IEE European Conference on Security and Detection (ECOS'97), London, UK, April, pp. 142-146.

[4] Carey M., Parris E., Bridle J. (1991). "A Speaker Verification System Using Alphanets", Proc. IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Canada, May 14-17, pp. 397-400.

[5] Martin A., Doddington G., Kamm T., Ordowski M., Przybocki M. (1997). "The DET Curve in Assessment of Detection Task Performance", Proc. Eurospeech-97, Rhodes, Greece, September, pp. 1895-1898.


Links