Thursday, June 21, 2012

Voice Biometrics: How it works!



Enrollment Process


Creating a Voiceprint.  The process of creating a voiceprint is called "enrollment". In most commercial speaker verification and identification systems there is a formal, one-time process where speech samples are collected from an individual. The basic steps of this process are outlined below:


Pre-Process.  Before being sent to the voice biometric engine, speech samples are fully evaluated using a custom audio pre-processing library. We determine if there are any issues which would prevent the engine from doings its job (i.e. background noise, distortion, low signal power, etc). Interactive feedback is provided if there are any issues -- this helps to greatly minimize "failure to enroll" errors.
Feature Extraction.  Once we are sure the speech samples are of sufficient content and quality, the VMM-1™ engine extracts unique vocal features from the samples. VBG uses standard acoustic features (MFCC and LPC), as well as our own custom feature set. The use of multiple feature sets allows us to build more accurate voice models.
Template Generation.  Extracted features are benchmarked relative to universal background models or cohort models, and are then further refined into a mathematical model that uniquely represents a user's speech patterns. This unique model is called a "template" or a "voiceprint".
Voiceprint Storage.  Voiceprints are not WAV or other audio files. They are statistical representations of speech and thus cannot be stolen or used anywhere except within our system. Our voiceprints use a proprietary storage format that is further encrypted within our database system. Finally, all databases are housed and managed 24x365 in our secure data centers.

Verification Process


Verification.  During a verification process, an individual makes a claim of identity (i.e. typically by saying or entering some kind of user ID) and then they are prompted to submit a speech sample. A temporary voiceprint is made from the speech sample and is then compared to the stored reference voiceprint for the individual.


Pre-Processing and Feature Extraction.  As with the enrollment process, the verification sample is evaluated with a number of audio quality tests. Once these tests have passed, features are extracted.
Comparison.  A temporary template (or voice print) is created. This temporary voice print is then compared to both the reference voice print for the user and "generic" voice print information from universal or cohort models. A score is derived for how closely the temporary print matches the reference voice print PLUS the relative score of the temporary print to the universal or cohort models. This dual scoring system provides greater matching accuracy.
Thresholds and Scoring.  The VMM-1™ engine is fully tunable and can operate at any desired confidence level. For instance, a client may wish to have a 99% match confidence for a particular group of users. So, VBG will configure the engine with the appropriate threshold for a 99% match confidence. As long as the score meets or exceeds this threshold value, a "pass" result is returned. Otherwise, a "fail" result is returned.

Identification Process

Identification.  During an identification process, there is no initial claim of identify made. Instead, an individual is simply prompted to submit a speech sample. A temporary voiceprint is made from the speech sample and is then compared to all reference voiceprints stored in the database in order to find the best match.

Pre-Processing and Feature Extraction.  As with the enrollment process, the identification sample is evaluated with a number of audio quality tests. Once these tests have passed, features are extracted.
Comparison.  A temporary template (or voice print) is created. This temporary voice print is then compared to all reference voice prints stored in the database. Various classification techniques are used to speed this process up (for instance, male versus female vocal characteristics).
Maximum Likelihood.  The identification process does not use a score threshold. Instead, all results are ranked in reverse order and are returned with the match probability and raw score. Our system suggests the best match, but our service API returns all results so that client systems can make the final decision or perform additional calculations if desired.










No comments:

Post a Comment