PASSIVE voice biometrics in an ACTIVE channel

by Ravin Sanjith

July 17, 2017

Speech processing has enjoyed incremental improvements for quite a while; most recently in the evolution from Automated Speech Recognition "(ASR") to Natural Language Understanding ("NLU"). As scientists continually devise methods to enhance biometric template creation and pattern matching, more recent innovations in machine leaning and artificial intelligence are yielding an ever-increasing spread of classification metrics, especially in Voice Biometrics (VB), through the virtuous cycle of ongoing machine learning from more data.

Initial VB implementations mainly involved text-dependent speaker authentication, referred to as "active voice biometrics", which when combined with ASR yielded better results than text-independent or (PASSIVE) VB. Today, thanks to significant technology advancements, PASSIVE authentication is used to deliver a highly accurate, in a more seamless customer experience.

Traditionally, implementations of text dependent versus text independent solutions were quite distinctly aligned with self-service implementations versus human-assisted channels, respectively. Hence, in most deployments, ACTIVE VB takes place in IVR and/or digital (e.g. mobile) channel, while PASSIVE VB applies to agent-on-phone interactions. But that is changing.

ACTIVE voice biometrics calls for user-specific behavior

Text-dependent VB, as the name suggests, depends on a specific or anticipated utterance, known as a 'passphrase', for which a wide range of options exist (see insert).

Examples of Text Dependent Passphrases

Common – where all callers are prompted for the same response (e.g. "I am using voice biometrics")
Random - typically a fixed set of phrases that could randomly prompted, also used to detect 'liveness' (e.g. "The best things in life are free")
Dynamic – similar to 'random' but generally uses short (4 to 6) number sequences (e.g. "3,7,4,8")
Individual – where a user is prompted to say their name or social security number; and is generally static per user (e.g. "AAA345CZ5")

As with all biometrics, users need to enrol first in order to verify as and when needed.

ENROLLMENT involves 2 components:

Identity assertion, also referred to as 'Ground Truth,' is an essential part of the enrollment process; most common for Contact Centres is the use of existing authentication mechanisms, mainly security questions or "KBA" (knowledge-based authentication) as it is more commonly known.
Voiceprint creation is achieved through collecting a number of utterances, usually 3, to accommodate for natural variances in speaker responses.

VERIFICATION is performed when the caller repeats the relevant phrase at least once, and this is compared to the enrolled voiceprint for a match/mismatch decision.

The VB process for ACTIVE enrollment and verification requires very specific user behaviors, and has often been cited as the root cause of caller abandonment from voice authentication; purportedly due to the cognitive load, mental effort, or more simply put, demands on the caller to concentrate on unfamiliar IVR instructions, especially if in a hurry or while multi-tasking.

Machine Learning is Blurring the lines between ACTIVE and PASSIVE voice biometrics

The application of machine learning methods have yielded massive improvements in both accuracy and speed of response for biometrics in general; but for voice biometrics this is a quantum leap! Less net caller audio is required to yield similar, if not better, accuracy. More specifically, where older-generation text-independent solutions typically required at minimum 6-8 seconds of net audio, they are now capable of providing adequate results within 3-5 seconds; this is generally within the length of text-dependent vocal responses.

Practically, what this means is that a spoken response, prompted by a speech-enabled IVR (or even a mobile screen prompt) may be long enough to gather adequate net audio to verify a caller without a specific passphrase. The real game changer – and this really is a game changer – is that a single voiceprint may be used for both ACTIVE and PASSIVE verification.

Conversational Responses in IVR and Digital Channels will Spearhead Large Scale Adoption

ACTIVE/text-dependent and PASSIVE/text-independent technologies are still regarded as being distinct and independent of one another. However, we are at the cusp of a revolution. As text-dependent and text-independent technologies meld into a single CONVERSATIONAL RESPONSE, the terms of ACTIVE and PASSIVE will no longer refer to the underlying technology. These terms may still remain in use, but will apply instead only to user behaviour, and even that may disappear as consumers voraciously increase their demand to seamless, frictionless and continuous experiences…but that topic is for a different note.

Furthermore, the promise of a single voiceprint will also drive further innovations in sharing and federation; a topic receiving growing attention in Identification & Verification regulatory circles, as well as immense momentum in collaborative fraud-mitigation.

Opus Research has always been a strong proponent of Voice Biometrics, and since early 2000 has witnessed immense technology improvements, however with adoption somewhat lagging; and one of the main causes is the trade-offs between text-dependent and text-independent capabilities, and the negative impact on enterprise integrations as well as user experience. This is especially evident in large contact centers where both self-service and agent-assisted authentication is required. Their blending into a single functiion brings enormous infrastructure and data efficiencies, and most importantly, a more seamless and frictionless customer experience which will be the tipping point for unlocking truly large scale adoption, globally.

About the Author

Ravin Sanjith

As analyst with Opus Research, Ravin Sanjith oversees coverage dedicated to building awareness, understanding and appreciation of accurate, affordable, usable, frictionless and scalable platforms for identity verification and intelligent authentication. Most recently, Ravin was co-founder and CEO at OneVault, a global pioneer in delivering voice biometrics solutions across the financial and telecoms sectors of Africa. Previous to OneVault, Ravin spent five years as an intrapreneur in a range of asset finance, commercial and private leasing companies where he held various C-level positions always driving business process and technological innovation. He started his working career in 1992 as an engineer, where he applied machine sensor technologies and disparate data sources to optimize operational performance of electricity distribution networks. He then went on to two of South Africa's largest retail banks, where he led the blueprinting of large scale ERP systems, including SAP. Additionally, Ravin worked with smaller Internet startups, most notably the creation of e-procurement hubs in the automotive sector working as COO at MotorOnLine, which was acquired by TransUnion in 2005. Ravin possesses a blend of technical and commercial acumen, accumulated over 25 years across multitude of industries and disciplines. Inspired by disruptive thinking and emerging technologies, Ravin has pioneered many innovations in large enterprises as well as startups, with specific interest in the use of biometrics to curb ID&V enterprise and government security together with customer experience challenges.

See All Blogs

PASSIVE voice biometrics in an ACTIVE channel

More from the blog

Is a ‘Good Enough’ attitude preventing your CX victory? How purpose-built AI can help you sprint to a medal worthy CX finish

The AI tools CXone WFM offers today

CX One on One: Jennifer Passini