Version: 1.0, Date: 20/12/1999
Here is a brief history of changes (including new features) from previous versions.
The most recent version of this documentation can be found at http://www.speech.kth.se/cost250/refsys/latest/doc.
The main purpose for researchers to agree on a common reference system is to increase comparability between experiment results produced at different sites, with different recognition systems and perhaps with different databases [2]. If a common recognition system is available at all sites and can be used with any database, then comparing results produced with this system can help understanding differences and similarities between results that are otherwise difficult to compare. The presented speaker recognition system is intended to become such a reference system. It is small, portable, can be shared by everybody, and it can easily be applied to any recognition task with any speech data.
Once a recognition task has been defined and data has been recorded, three components are needed to run an experiment:
The score file from the first two components should be processed by the third component, a scoring component, to extract performance figures and/or visualize performance. A preliminary specification and implementation of a scoring component has recently been added to the Reference System. This includes the calculation of a test set EER and a test set ROC curve based on speaker-independent thresholds, as defined in the EAGLES handbook [2]. The ROC data can be used as input to some DET-plotting software, which is not yet included in the Reference System.
This is the documentation to release 1.0 of the COST250 Speaker Recognition Reference System. The documentation describes how to install the system on a unix platform and how to run a simple calibration test to verify that the installation is correct. It further describes the operation of the Reference System and how it can be used.
Comments, suggestions or questions about the Reference System may be sent to melin@speech.kth.se.
2. Portability Issues
The Reference System is coded partly in ANSI-C and partly in the Tcl scripting language. The actual
speaker recognition system parts, such as the feature extractor (LPC
cepstrum) and the classifier (VQ), are coded in ANSI-C while
system-level parts are coded in Tcl.
The ANSI-C source codes should compile and execute equivalently on any platform. Tcl interpreters exist for most platforms and can be downloaded for free. Hence, all program code should be portable to most platforms. So far they have been tested successfully on the following types of unix systems:
A calibration test is provided with the distribution. This calibration
test should be run first thing after installation to make sure the
system produces the same results on the same experiment at all sites.
So far, identical score output files have been produced on all tested
unix systems for this calibration test.
3. Future Extensions
One of the reasons for choosing Tcl for implementing the system-level parts
of the Reference System is the potential for future extensions. Given the
modular structure of the system, where the recognition engine is a well contained object,
interactive speaker recognition applications could easily be created with
the addition of other Tcl extension packages, such as
Snack
for audio extensions and Tk for graphical user interfaces.
Continue to installation instructions.
[2] Bimbot F., Chollet G. (1997). "Assessment of speaker verification systems", In: Handbook of Standards and Resources for Spoken Language Systems, Gibbon D., Moore R., Winski R. (Eds.), Mouton de Gruyter, ISBN 3-11-015366-1.
[3] Ariyaeeinia A., Sivakumaran P. (1997). "Comparison of VQ and DTW Classifiers for Speaker Verification", Proc. IEE European Conference on Security and Detection (ECOS'97), London, UK, April, pp. 142-146.
[4] Carey M., Parris E., Bridle J. (1991). "A Speaker Verification System Using Alphanets", Proc. IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Canada, May 14-17, pp. 397-400.
[5] Martin A., Doddington G., Kamm T., Ordowski M., Przybocki M. (1997). "The DET Curve in Assessment of Detection Task Performance", Proc. Eurospeech-97, Rhodes, Greece, September, pp. 1895-1898.