About the project

Building a secure pipeline to share diverse voices to improve the representation of diversity of speech patterns

The Speech Accessibility Project will collect speech samples from individuals representing a diversity of speech patterns. UIUC researchers will recruit paid volunteers to contribute recorded voice samples and will create a private, de-identified dataset which can be used to train machine learning models to better understand a variety of speech patterns. The project will focus first on American English.

Couple recording with laptop
Woman smiles at her phone screen

Artificial intelligence and machine learning allow people to use speech recognition, such as voice assistants or translation tools, to operate technology using their voices. Speech recognition is powered by machine learning; without diverse, representative data, ML models cannot learn how to understand a diversity of speech. This project aims to change that by creating the dataset needed to more effectively train these machine learning models.

Hands on a keyboard

Instead of separate and duplicative initiatives by different companies and research teams, the groups will collaborate on this project to gather a set of high-quality, representative speech samples that will help accelerate the technologies that support these communities of people with diverse speech patterns. 

Man looks at phone screen

Frequently asked questions

Find answers to common questions about the Speech Accessibility Project.