ASU researchers develop special microphone to verify human speech


Visar Berisha sits to the right of a microphone

Visar Berisha, a professor of electrical engineering in the Ira A. Fulton Schools of Engineering at Arizona State University with a joint appointment in ASU’s College of Health Solutions, records speech with OriginStory technology. OriginStory, which won the U.S. Federal Trade Commission AI Voice Cloning Challenge, uses a special microphone with sensors that detect qualities of speech produced only by humans, ensuring voice recordings are not generated by artificial intelligence. Photo courtesy of Visar Berisha

|

​Deepfakes have become a large societal concern with the advent of video and audio content generated by artificial intelligence, or AI.

A deepfake is a convincing imitation that blurs the lines between fantasy and reality. They can cause trouble in determining, for example, whether a politician actually made a troubling statement or if they were sabotaged by those seeking to interfere in an election.

“Until recently, the sound of a recorded voice was universally accepted as genuinely human,” says Visar Berisha, a professor of electrical engineering in the Ira A. Fulton Schools of Engineering at Arizona State University with a joint appointment in the university’s College of Health Solutions. “There was no reason to doubt its authenticity. With the advent of voice cloning technology, this trust is eroding and skepticism, rather than trust, will become the new norm.”

With deepfakes' potential to ruin reputations and erode faith in institutions, the U.S. Federal Trade Commission, or FTC, held the FTC Voice Cloning Challenge to develop creative multidisciplinary methods to combat AI-generated deepfake audio for a share of $35,000 in prize money.

One of the contest’s winners is OriginStory, a project that uses a new kind of microphone — one that first verifies that a human speaker is producing recorded speech — then watermarks the speech as authentically human. The watermark can be shown to listeners, establishing a chain of trust from recording to retrieval.

OriginStory’s development is heavy on ASU involvement; the project was developed with university resources and patented through Skysong Innovations, ASU's exclusive intellectual property management company.

Berisha leads the development team, which includes fellow ASU faculty members Daniel Bliss, a Fulton Schools professor of electrical engineering in the School of Electrical, Computer and Energy Engineering; and Julie Liss, College of Health Solutions associate dean and professor of speech and hearing science.

Human biology to the rescue

Although human and AI-generated speech can sound similar to the untrained ear, the way these signals are generated are markedly different. Deepfakes are algorithmically generated using neural networks, a type of machine learning technology.

On the other hand, the biological human speech production mechanism includes intermediate biosignals such as vocal cord vibrations and movements of articulators, which are the body parts used to form speech such as the lips, tongue and nasal cavity.

OriginStory uses sensor technology already present in a variety of electronics to detect these biosignals while the microphone performs its normal function of recording speech. Because the biosignals and speech are recorded at the same time, OriginStory can confirm the authenticity of a recorded human voice.

The presence of the biosignals indicates that a distinctly human speech production mechanism generated the speech. OriginStory also ensures the privacy of those recorded, as the biosignals it verifies are distinguishable between humanity and AI, but not between different individuals.

The resulting audio gets a watermark embedded in the file verifying its legitimacy. Any future retrieval of the media can then be guaranteed as authentically human to ensure public trust.

Addressing threats in a new AI-powered era

Inspiration for the idea came from a news story Berisha saw in 2023 about a mother living in the Phoenix area who received a call from a scammer claiming to have kidnapped her daughter.

The teenage girl, however, was safe and sound; what was supposedly her voice on the phone was an AI clone.

“It was really scary to read, and it hit home in a personal way because I have kids about the same age,” Berisha says.

Liss, an expert in speech physiology and speech acoustics, joined the project because of her alignment with Berisha on the dangers of AI voice cloning technology. She says developing protection against AI-generated speech is crucial to ensure world security.

The project is the latest in more than 10 years of collaboration between the pair on projects transcending boundaries between engineering and health applications.

“To translate innovative ideas into practical solutions, interdisciplinary collaborations are crucial,” Liss says. “ASU expects its faculty to imagine and try bold and innovative approaches to solving the world’s challenges. It’s baked into the culture here.”

With the Voice Cloning Challenge award under its belt, the OriginStory team aims to continue refining the technology for eventual commercialization. The team members will work with Drena Kusari, vice president of product at Microsoft, leveraging her expertise in developing tech products and bringing them to market.

For Berisha, the FTC naming OriginStory as one of its winners emphasizes the importance of the technology’s potential widespread use in society.

“Our selection serves as further validation for our central thesis: We need new technology to establish a chain of trust that a voice is authentically human from the moment it is recorded to when it is listened to,” he says.

More Science and technology

 

A graphic announcing the "cool" products of TOMNET with people working in the foreground and computer screens with data in the background.

ASU travel behavior research center provides insights on the future of transportation

The Center for Teaching Old Models New Tricks, known as TOMNET, has spent the past seven years conducting research and developing tools to improve transportation systems planning methods and data.As…

Illustration of a line up with four black silhouettes and one maroon silhouette

When suspect lineups go wrong

It is one of the most famous cases of eyewitness misidentification.In 1984, Jennifer Thompson was raped at knifepoint by a man who broke into her apartment. During the assault, she tried to make a…

Adam Doupé and the Shellphish team cheer from their seats in the Las Vegas Convention Center.

Jackpot! ASU hackers win $2M at Vegas AI competition

This August, a motley assortment of approximately 30,000 attendees, including some of the best cybersecurity professionals, expert programmers and officials from top government agencies packed the…