VOICE DEEPFAKES

sree nivas February 7, 2023 GS3, Science & technology Leave a comment 204 Views

On January 29, several users of the social media platform 4chan, used “speech synthesis” and “voice cloning” service provider, ElevenLabs, to make voice deepfakes of celebrities like Emma Watson, Joe Rogan, and Ben Shapiro.
These deepfake audios made racist, abusive, and violent comments. Making deepfake voices to impersonate others without their consent is a serious concern that could have devastating consequences.
In response to such use of their software, ElevenLabs tweeted saying, “While we see our tech being overwhelmingly applied to positive use, we also see an increasing number of voice cloning misuse cases.”

What are voice deepfakes?

A voice deepfake is one that closely mimics a real person’s voice. The voice can accurately replicate tonality, accents, cadence, and other unique characteristics of the target person. People use AI and robust computing power to generate such voice clones or synthetic voices. Sometimes it can take weeks to produce such voices, according to Speechify, a text-to-speech conversion app.

How are voice deepfakes created?

To create deepfakes one needs high-end computers with powerful graphics cards, leveraging cloud computing power. Powerful computing hardware can accelerate the process of rendering, which can take hours, days, and even weeks, depending on the process. Besides specialised tools and software, generating deepfakes need training data to be fed to AI models. This data are often original recordings of the target person’s voice. AI can use this data to render an authentic-sounding voice, which can then be used to say anything.

What are the threats arising from the use of voice deepfakes?

Attackers are using such technology to defraud users, steal their identity, and to engage in various other illegal activities like phone scams and posting fake videos on social media platforms.
According to one of Speechify’s blog posts, back in 2020, a manager from a bank in the UAE, received a phone call from someone he believed was a company director.
The manager recognised the voice and authorised a transfer of $35 million. The manager had no idea that the company director’s voice was cloned.
In an other instance, fraudsters used AI to mimic a business owner’s voice directing the CEO of a UK-based energy firm to immediately transfer around $243,000 to the bank account of a Hungarian supplier of the company.
The voice belonged to a fraudster who spoofed the CEO, The Wall Street Journal reported in 2019.
Voice deepfakes used in filmmaking have also raised ethical concerns about the use of the technology. Morgan Neville’s documentary film on the late legendary chef Anthony Bourdain used voice-cloning software to make Bourdain say words he never spoke. This sparked criticism.
Gathering clear recordings of people’s voices is getting easier and can be obtained through recorders, online interviews, and press conferences.
Voice capture technology is also improving, making the data fed to AI models more accurate and leading to more believable deepfake voices. This could lead to scarier situations, Speechify highlighted in their blog.

What tools are used for voice cloning?

OpenAI’s Vall-e, My Own Voice, Resemble, Descript, ReSpeecher, and iSpeech are some of the tools that can be used in voice cloning. ReSpeecher is the software used by Lucasfilm to create Luke Skywalker’s voice in the Mandalorian.

What are the ways to detect voice deepfakes?

Detecting voice deepfakes need highly advanced technologies, software, and hardware to break down speech patterns, background noise, and other elements. Cybersecurity tools have yet to create foolproof ways to detect audio deepfakes, Speechify noted.
Research labs use watermarks and blockchain technologies to detect deepfake technology, but the tech designed to outsmart deepfake detectors is constantly evolving, Norton said in a blog post.
Programmes like Deeptrace are helping to provide protection. Deeptrace uses a combination of antivirus and spam filters that monitor incoming media and quarantine suspicious content, Norton noted.
Last year, researchers at the University of Florida developed a technique to measure acoustic and fluid dynamic differences between original voice samples of humans and those generated synthetically by computers.
They estimated the arrangement of the human vocal tract during speech generation and showed that deepfakes often model impossible or highly unlikely anatomical arrangements.
Call centres can also take steps to mitigate the threat from voice deepfakes, according to voice recognition engineers at Pindrop. Callback functions can end suspicious calls and request an outbound call to the account owner for direct confirmation.
Multifactor authentication (MFA) and anti-fraud solutions can also reduce deepfake risks. Pindrop mentioned factors like devising call metadata for ID verification, digital tone analysis, and key-press analysis for behavioural biometrics.

SOURCE: THE HINDU, THE ECONOMIC TIMES, PIB

Post Views: 57