
Balasubramanian says voice AI services need to provide security on par with other companies that store personal data, such as financial or medical information.
“You have to ask the company, ‘How will my AI voice be stored?'” Do you actually store my recordings? Do you store it encrypted? Who has access to it? “She is part of me. It is self intimate. I need to protect him too.”
Podcastle says that the audio forms are end-to-end encrypted and that the company does not keep any recordings after the form is created. Only the account holder who recorded the audio clips can access it. Podcastle also does not allow other audio files to be uploaded or analyzed on Revoice. In fact, the person creating a transcript of their own voice has to record the pre-written lines of text directly into the Revoice app. They can’t just load a previously recorded file.
“You give permission and create the content,” says Podcastle’s Yeritsyan. “Whether it’s artificial or authentic, if that’s not a deepfake voice, it’s this person’s voice and he put it out there. I don’t see problems.”
Podcastle hopes that being able to present audio in a reproduced voice to only an agreeing person will dissuade people from making themselves say anything so outrageous. At the moment, the service does not contain any content modification or restrictions on specific words or phrases. Yeritsyan says it’s up to any service or outlet that publishes audio — such as Spotify, Apple Podcasts or YouTube — to monitor what content is pushed to their platforms.
“There are huge moderation teams on any social platform or any streaming platform,” says Yeritsian. “So it’s their job to not let anyone else use the fake audio and create something stupid or something immoral and put it out there.”
Even if the very thorny problem of deepfakes and incompatible AI cloning is addressed, it remains unclear whether people will accept computerized cloning as an acceptable alternative to a human.
At the end of March, comedian Drew Carey used another vocal AI service, ElevenLabs, to release an entire episode of a radio show that was read by a clone of his voice. For the most part, people hated it. Podcasting is an intimate medium, and the distinct human connection you feel when listening to people having a conversation or telling stories is easily lost when the bots move to the microphone.
But what happens when technology has advanced so much that you can’t tell the difference? Does it matter that your favorite podcaster isn’t really in your ear? Reproduced AI speech has a ways to go before it becomes indistinguishable from human speech, but it is certainly catching up quickly. Just a year ago, AI-generated images looked cartoonish, now they’re realistic enough to fool millions into thinking the pope is wearing a new outer garment. It is easy to imagine that the voice generated by artificial intelligence will have a similar path.
There is also another human trait that drives interest in these AI tools: laziness. AI vocal technology – assuming it gets to the point where it can accurately simulate real sounds – will make it easy to make quick edits or retakes without having to bring the host back to the studio.
“In the end, the creative economy will win out,” says Balasubramanian. “No matter how much we think about the moral ramifications, you will prevail because you have made people’s lives simple.”