Seminar at Speech, Music and Hearing:

Thesis defense:

Automatic speaker verification on site and by telephone: methods, applications and assessment

Håkan Melin


Speaker verification is the biometric task of authenticating a claimedidentity by means of analyzing a spoken sample of the claimant\'svoice. The present thesis deals with various topics related toautomatic speaker verification (ASV) in the context of its commercialapplications, characterized by co-operative users, user-friendlyinterfaces, and requirements for small amounts of enrollment and testdata.
A text-dependent system based on hidden Markov models (HMM) wasdeveloped and used to conduct experiments, including a comparisonbetween visual and aural strategies for prompting claimants forrandomized digit strings. It was found that aural prompts lead to moreerrors in spoken responses and that visually prompted utterancesperformed marginally better in ASV, given that enrollment data werevisually prompted. High-resolution flooring techniques were proposedfor variance estimation in the HMMs, but results showed no improvementover the standard method of using target-independent variances copiedfrom a background model. These experiments were performed on Gandalf,a Swedish speaker verification telephone corpus with 86 clientspeakers.
A complete on-site application (PER), a physical access control systemsecuring a gate in a reverberant stairway, was implemented based on acombination of the HMM and a Gaussian mixture model basedsystem. Users were authenticated by saying theirproper name and a visually prompted, random sequence of digits afterhaving enrolled by speaking ten utterances of the same type. Anevaluation was conducted with 54 out of 56 clients who succeeded toenroll. Semi-dedicated impostor attempts were also collected. An equalerror rate (EER) of 2.4% was found for this system based on a singleattempt per session and after retraining the system on PER-specificdevelopment data. On parallel telephone data collected using atelephone version of PER, 3.5% EER was found with landline and around5% with mobile telephones. Impostor attempts in this case weresame-handset attempts. Results also indicate that the distribution offalse reject and false accept rates over target speakers are welldescribed by Beta distributions. A state-of-the-art commercial systemwas also tested on PER data with similar performance as the baseline

14:00 - 17:00
Tuesday December 19, 2006

The seminar is held in Sal F3, Lindstedtsvägen 26.

