COST250 Speaker Recognition Reference System
6. The cost250 Tcl package
This chapter describes the commands provided by the cost250 Tcl
package and how to use them. The command package require
cost250 gives you access to all commands in the package.
6.1 Using the package
For an example of how to use the commands in this package, see the
file work/runtest.tcl (click here to view it in a separate
browser window). It contains a program that runs a complete experiment
with the Reference System. The commands contained in the cost250
package are printed with red, bold characters in the example. The remainder of
the example program is Tcl code and comments. The program starts by creating a database and an engine
object and then prints the properties
of the objects. It then executes the
operations defined in three separate experiment
definition files. Those files are supposed to define 1) training
of a non-client model, 2) enrollment of clients, and 3) verification
attempts. The program finally deletes the created database and
engine objects and terminates.
The example program assumes that the experiment definition files
are organized according to the following convention. A set of three
experiment definition files are identified by a tag and are supposed
to be stored in a directory ../experiment/tag.
The three files should have names
- os_wld.exp - for training of a world model (or any set of
non-client models used by the recognition engine).
- es.exp - for enrollment
- ts.exp - for verification attempts
The tag has the format database/experiment. The
calibration test on Polycost is for instance identified by the tag
polycost/test, and the first baseline experiment has the
tag polycost/be1. The given example program takes the
experiment tag as its only command line argument, and it can be used
with any new experiment, as long as the experiment definition files
follow this convention.
The commands are described below from an object-oriented point-of-view. A
description of the object-oriented programming style we have tried to
follow with Tcl can be found here.
6.2 Experiment class
The single command in the Experiment class has the following
syntax:
Experiment::run experimentFileName -database databaseName -system engineName ?-outdir
modelDir? ?-resultfile outfile?
This command will perform all operations listed in the file
experimentFileName. The given database and engine object will
be used to implement the operations. Enrolled models are saved in
directory modelDir and results are saved in outfile.
Default value for modelDir is the current directory and the
default value for outfile is results.llk in the
current directory.
The required format for the experiment definition file is described in
section File Format Description.
The source code for the Experiment class can be found in tcl/runexp.tcl in the source
distribution.
6.3 Database object
The purpose of the database class is to provide the recognition
engine with speech data. The current database class is very simple
and is suitable for databases where all speech files are stored
in any subdirectory of a single base directory, such that
each speech file in the database has a filename
datadir / fileTag . alwExt
where datadir is the base directory, fileTag uniquely
identifies a single speech file, and alwExt is the filename
extension. Polycost satisfies this condition if all subject
directories are copied to disk into the same directory. It is not
satisfied with speech data being stored on two separate
CR-ROMs. Furthermore, the Reference System recognition engine
currently assumes that the speech file supplied by the database object
is a headerless, 8 kHz A-law file. The current task of the database
class is to map a file tag onto a complete speech filename. This should
be changed in future versions of the Reference System to enable the
use of other speech file formats with other databases than
Polycost; the database object should return the actual, decoded,
speech data instead of the name of a file.
A database object is created with the command
Database::new datadir
It returns the name of the created object, databaseName,
defined in the global namespace. The name will be something like
"::db01". Multiple database objects can be created and they
will all have unique names.
This is the typical life cycle of a database object:
- The object is created by the Database::new command.
- Default property values are optionally overridden with
the set method.
- The init method is called after all properties have been
set.
- The database object is passed as an argument to one or more Experiment::run commands and subsequently
to a recognition engine object. The engine object will call the
database object's filename method to map a speech file tag to
a filename.
- The destroy method is called when the object is no longer needed.
The following commands can be used to operate on a database object.
The command name is the name of the object itself (as returned by Database::new) and the first argument is
the method name. Any remaining arguments are arguments to the
method.
databaseName set propertyName value
Sets the value of a property. A property always has a default
value which can be overridden by calling this method. Table 1 lists the properties that are defined for
a database object and their default values.
databaseName init
Initializes the object. This method should always be called after
set has been called and before any other methods are called
for this object.
databaseName print ?channel?
Prints the property values of the object. An optional output
channel can be given. Default output channel is stdout.
databaseName getdir
Returns the name of the database root directory.
databaseName filename fileTag
Returns the full file name for the speech file indicated by
fileTag.
databaseName destroy
Destroys the object. This method should always be called when the
object is no longer needed.
The source code for the Database class can be found in tcl/polycost.tcl in the source
distribution.
| property |
default |
description |
| alwExt |
.alw |
the filename extension for speech files in the database |
| trace |
0 |
currently not used. |
Table 1. Properties in a Database object.
6.4 Recognition engine object
The purpose of the recognition engine is to perform enrollment
and verification operations. The engine class included in this
package is called ReferenceSystem. It is the heart
of the COST250 Speaker Recognition Reference System.
With the default setup of the engine (note that only when the
default setup is used the system is truly a reference system!)
it implements a recognizer with the following characteristics:
- The speech signal is pre-emphasized using a factor of 0.97 and
divided into 100 overlapping frames per second. The analysis window
for each frame is 25 ms and, hence, we have a 60% overlap between
consecutive frames. For each frame a 12th order LPC-cepstrum vector is
computed.
- A 12-dimensional VQ codebook with 64 codewords constitutes a
client speaker model. Another VQ codebook, also with 64 codewords,
constitutes a non-client world model and is used for score
normalization. Codebooks are trained using the LBG-algorithm.
- The output from a verification attempt is a score value computed
as the non-client VQ distortion divided by the client VQ distortion.
A reference recognizer engine object is created
with the command
ReferenceSystem::new databaseName
where databaseName is the name of a database object
from which speech files can be retrieved.
The command returns the name of the created object, engineName,
defined in the global namespace. The name will be something like
"::rec01". Multiple engine objects can be created and they
will all have unique names.
This is the typical life cycle of an engine object:
- The object is created by the ReferenceSystem::new command.
- Default property values are optionally overridden with the
set method.
- The init method is called after all properties have been
set.
- The engine object is passed as an argument to the
Experiment::run command. This command will call the engine
object's enrollment and verification methods to perform
a list of operations.
- The destroy method is called when the object is no longer needed.
The following commands can be used to operate on an engine object.
The command name is the name of the object itself (as returned by
ReferenceSystem::new) and the first argument is the method
name. Any remaining arguments are arguments to the method.
engineName set propertyName value
Sets the value of a property. A property always has a default
value which can be overridden by calling this method. Table 2 lists the properties that are defined for
an engine object and their default values.
engineName init
Initializes the engine. This method should always be called after
set has been called and before any other methods are called
for this object.
engineName print ?channel?
Prints the property values of the object. An optional output channel
can be given. Default output channel is stdout.
engineName enrollment identity fileTags outDir
Performs one enrollment operation, where a model is trained for
the given identity based on the speech files indicated by
fileTags (a Tcl list). The produced model is saved in directory
outDir.
engineName verification speaker identity fileTags
resultChannel
Performs one verification trial, where a speaker claims to have a
given identity using the speech files indicated by
fileTags (a Tcl list) to support the claim. The actual identity of the
speaker is speaker. The result is printed to
resultChannel.
engineName destroy
Destroys the object. This method should always be called when the
object is no longer needed.
The source code for the ReferenceSystem class can be found in tcl/vqst.tcl in the source
distribution.
| property* |
default |
description |
| lpcConfig* |
-f200 -o60 -p12 -a0.97 -s1 |
configuration options to lin2parProgram. The available
configuration options are described in Table 3. |
| cbSize* |
64 |
size of codebook to create during an enrollment operation |
| normalization* |
1 |
1 = use score normalization; 0 = no score normalization |
| nonClientModels* |
W |
a list of non-client model names separated with space (for example: "W", or "F M") |
| binDir |
"" |
where executable VQST files reside. May be empty ("") if they are
in the current search path. |
| alw2linProgram* |
alw2lin |
binary program that decodes A-law sample to linear scale |
| lin2parProgram* |
lpccep |
binary program that parameterizes a speech file |
| concatProgram* |
concat |
binary program that concatenates parameterized files |
| gencbProgram* |
gencb |
binary program that trains a speaker model (codebook) |
| vqtestProgram* |
vqtest |
binary program that computes a score for a test utterance against a speaker model |
| tmpDir |
/tmp |
where to create temporary files |
| cliDir |
cli |
where to find client model files (codebooks) |
| refDir |
ref |
where to find non-client model files (codebooks) |
| linearExt |
.lin |
filename extension for sample files with linear scale |
| paramExt |
.lpc |
filename extension for parameter files |
| modelExt |
.cb |
filename extension for client and non-client model files (codebooks) |
| uniqTag |
|
set automatically by the new command |
| lpcConfigFileName |
|
set automatically by the new command |
| trace |
0 |
Set to 1 for trace output; 0 for no trace output. |
Table 2. Properties in a ReferenceSystem
object. *These properties must have their default values
for the system to perform as a reference system in the sense that it
produces the same results in all sites.
| option |
default |
description |
| -f |
256 |
the speech frame size in samples |
| -o |
50[%] |
the overlap between the neighbouring speech frames in
percentage |
| -p |
12 |
the LPC-cepstrum vector size |
| -t |
0.01 |
the absolute energy threshold: this parameter is used to
discard any speech frame whose absolute average energy is less
than the given value |
| -a |
0.95 |
the pre-emphasis coefficient |
| -h |
0 |
header size (in number of integers) : its default value is
"0" meaning use all the speech samples in the input file. If it is set
to any non-zero value, that many samples at the top of the speech file
will be ignored. Note that when the alw2lin program is used
to produce the input to the LPC-cepstrum program, header size must be 0. |
| -s |
0 |
a trace information parameter: if set to zero, info on various
parameter values will be printed. If set to "1" no parameter values
are printed (silent mode). |
Table 3. Configuration options for the LPC-cepstrum program.