A tool to clone spoken language is generating debate. The manufacturer emphasizes the potential and avoids launching it on the market.

Mouth of an artificial female figure with a computer circuit board.

Cloning votes without consent is illegal Photo: Christian Ohde/imago

The call came while his 15-year-old daughter was on a skiing trip. There was an unknown number on the screen, but when she answered the call, Jennifer DeStefano heard her daughter's voice crying and asking for help, DeStefano told WKYT. Then a man's voice spoke, demanding a ransom and threatening to harm her daughter.

Solo: There was no kidnapping. The scammers used software to clone the daughter's voice. “It was her voice one by one. It was her tone. “It was exactly as if she had cried,” the mother said. The case, which fortunately was resolved quickly, dates back almost a year, but has gained new relevance thanks to the latest publication by the American company OpenAI. The company, specialized in artificial intelligence, presented its latest tool at the end of last week: Voice Engine, a program that allows cloning voices, much faster than previous programs.

Artificial intelligence (AI) is one of the technologies that is currently making the greatest advances in development and OpenAI is one of the leading companies. Starting with a non-profit approach and the idea of ​​developing artificial intelligence systems that are good for humanity, Microsoft is now a major investor and the company's products are quite controversial. The same goes for Voice Engine. OpenAI uses examples to show that the program creates a new audio sequence based on a 15-second audio recording and a text input that speaks the entered text and sounds very close to the speaker's voice from the 15-second sample. . Until now, vocal samples of at least one minute in length were needed as a basis for such voice cloning.

OpenAI highlights positive possibilities: for example, people who can no longer speak due to illness could use their voice again. At least if there is a 15 second audio recording of the person, which is probably the case for many people in voicemail times. Another area could be international communication. OpenAI featured audio samples generated in multiple languages, from English to Japanese and Swahili. Here again, the basis for this is a 15-second reference recording and a text entry, which the AI ​​transforms into speech. Texts can now be translated quickly and generally with high quality using AI, for example with providers such as Google Translate or DeepL.

This text comes from Laborable day. Our left-wing weekly! Every week, wochentaz is about the world as it is and as it could be. A left-wing weekly with a voice, attitude and a special vision of the world. New every Saturday on newsstands and of course by subscription.

However, OpenAI has not released the model for general use, but has only presented the results. “We recognize that generating speech that resembles people's voices poses serious risks,” the company said in a blog post. The technology is currently being tested “on a smaller scale” and a decision will then be made on how to proceed. The partners involved in the tests would have to accept a series of conditions. Among other things, votes could only be used if the people involved agreed. In addition, the company has developed a digital watermark that allows traceability of the created sequences.

The voice of Navalny's mother

“An obvious idea about the dangers of generating voices synthetically is the use of disinformation,” Sami Nenno, who researches this topic at the Alexander von Humboldt Institute for Internet and Society, tells taz. These fake sequences are called audio deepfakes. An example: an alleged audio recording of the mother of the late Russian opponent Alexei Navalny, in which he is said to make serious accusations against his wife. According to Nenno, these audio-only deepfakes are rare today.

His colleague Matthias Kettemann, professor of innovation law, makes it clear: cloning voices without consent is illegal. But crimes ranging from identity theft to hate speech already demonstrate that a ban does not necessarily result in effective prosecution. Therefore, demand for a type of watermark to clearly identify AI-generated content has become widespread, as is also the case with Voice Engine. Kettemann is skeptical: “All labels can be removed; And malicious actors don't follow through anyway.” Therefore, education is most important, starting at school. And caution: “It is prudent that OpenAI has decided not to use AI voice across the board, especially in a super electoral year, that would also be a challenge in terms of democratic politics.”