Perspective

Prepare to Protect Your Customers’ Voices

Digital assistants are always listening, creating a significant security risk as the threat of voice-based cybercrime grows.


By: Paul Mee and Gokhanedge Ozturk

This article was first published in MIT Sloan Management Review on May 05, 2020.

The threat of voice-based cybercrime is growing along with the explosion of voice-directed digital assistants, billions of which are already embedded in our mobile phones, computers, cars, and homes. Digital assistants are always listening, creating a significant security risk, especially as millions of people work from home during the pandemic. It’s estimated that in the next two years, there will be more virtual digital assistants than people in the world. Nearly two-thirds of businesses plan to use voice assistants for their customer interactions, according to a 2018 survey conducted by Pindrop Security.

Already, the number of cyberattacks on voice-recognition systems is rising as people converse with bots to play music, pay their bills, book reservations at restaurants, and perform other everyday tasks. It now takes less than four seconds of original audio to create a deepfake of someone’s voice. Recently, hackers used machine learning software to impersonate a CEO and order subordinates to wire hundreds of thousands of dollars to a fraudulent account.

Much of today’s voice fraud, known as “vishing,” involves controlling voice assistants by methods such as embedding undetectable audio commands, replaying voice recordings, and modifying fraudsters’ voices to match the pitch of their victims’ voices. As hackers become better at impersonating people, they will be able to apply deepfakes of voices that will be far harder to detect.

The damage could be catastrophic unless companies take appropriate cybersecurity precautions. Financial services companies send millions of customers new credit cards after criminals steal information, but they can’t send them new voices. Unless voice activation is made secure, the rapid growth of machines that recognize voice commands could grind to a halt, damaging customers’ trust in the privacy safeguards of the many companies that use voice systems.

Pindrop’s survey found that 80% of businesses worldwide considered the security of their voice-directed services to be a concern. So how can managers make their customers’ voices safe?

As a first principle, companies should roll out voice-directed services only when they are confident of their ability to mitigate the accompanying cybersecurity risks. For example, at first, financial companies may want to offer customers only the ability to check basic facts by voice — such as account balances and stock quotes — and have them continue to use manual means or biometrics like the person’s face or fingerprint to execute transactions.

As the range of voice-activated services extends and becomes more sophisticated, here are some other measures businesses can take.

Strengthen Customer Authentications

Companies should introduce screening protocols for voice-controlled services that are at least as robust as those used for other digital services. Customers should receive alerts if their orders exceed a certain threshold or appear to deviate from their typical purchasing patterns.

Companies can increase awareness of potential scams by distributing checklists to help customers gauge whether a third party’s approach or a request for information could be fraudulent. For example, a company could advise customers to hang up if a caller doesn’t know their name and relationship to the company, or if the caller’s phone number seems suspicious. Recently, scammers have been tricking people into giving away sensitive information when they use voice searches to find customer service numbers. Customers should also be made aware of the extent to which they are insured against fraud whenever a company launches a new voice-directed service.

At the same time, voice-directed services should ask for additional forms of authentication. These could consist of biometrics such as a customer’s fingerprint or face. Or they could be qualitative verbal authentications that can’t be found in the public domain — personal preferences, for instance, or the relative a customer visited with most often as a child, or both.

Companies will also have to invest in filtering technologies that detect whether a voice is real or synthesized as they become available. Some companies are already trying out technologies that can detect clues that human hearing normally misses, such as the sound of breathing, which may be present in a genuine voice but absent in a synthesized impersonation. Systems are also being designed to block inaudible commands by identifying and canceling ultrasonic signals, which researchers have found can take control of voice-recognition systems.

Conduct Cyber Exercises

Hackers will continue to develop new methods to exploit the weakest links in systems. Companies offering voice-activated services need to test their security constantly, conducting cyber exercises that identify vulnerabilities to determine ways to plug the gaps. They should also prepare responses to deploy in the event of a successful cyberattack.

As a training exercise, some of a company’s cybersecurity experts could try to exploit a voice assistant’s security gaps while others guard against the attackers. Alternatively, companies could engage ethical hackers to conduct surprise attacks on voice-assistant services — either on their own or in collaboration with other businesses or industries. The defense and payment industries already hold cross-industry cyber war games of this kind.

Cybersecurity teams should simultaneously explore alternate ways of operating should a voice-related cyber crisis arise. Experts should puzzle out in advance how to react to scenarios by considering a series of questions: If withdrawals are suddenly made from a bank using deepfaked customer voices, how should it react? How would it detect an attack in the first place? And what alternatives should be made available in the event of an emergency?

Communicate Across Industry

Today, regulations exist for voice-directed services. For example, the California Consumer Privacy Act limits the sale and mining of consumer voice data collected through smart televisions. Europe’s General Data Protection Regulation requires companies to report personal data breaches to their relevant local regulatory authority, though it does not currently address voice compromises directly.

Whatever the rules, cybersecurity officers should maintain regular contact with governments and others in their industries in order to stay ahead of regulations and potential new threats. Voice operations — and the convenience and efficiency they bring — will only spread so long as the companies offering them show that they can safeguard customers’ voices.

Companies should establish forums and other methods to share data about voice-assistant breaches so that whole industries can stay ahead of their adversaries. Once voice assistants become a common method for transferring money, new security protocols may also be needed.

The Potential of the Conversational Economy

If voice-directed services were made secure, they could deliver services that would improve — and possibly transform — consumers’ daily lives. People could tell cars to take them to appointments. They could turn to mobile phones to arrange their vacations. One day, they might even ask virtual assistants for financial advice.

But delivering this conversational future will require cybersecurity to stay ahead of hackers’ ability to abuse voice systems. Businesses should prioritize exploring now what it will take to keep their customers’ voices safe — and prepare to continue the battle indefinitely.