About the project
Building a secure pipeline to share diverse voices to improve the representation of diversity of speech patterns
The project is collecting speech samples from paid volunteers representing a diversity of speech patterns. UIUC researchers are using the recordings to create a private, de-identified dataset for training machine learning models to better understand a variety of speech patterns. The project's first focus is on American English.
Artificial intelligence and machine learning allow people to use speech recognition, such as voice assistants or translation tools, to operate technology using their voices. Speech recognition is powered by machine learning; without diverse, representative data, ML models cannot learn how to understand a diversity of speech. This project aims to change that by creating the dataset needed to more effectively train these machine learning models.
Instead of separate and duplicative initiatives by different companies and research teams, the groups are collaborating on this project to gather a set of high-quality, representative speech samples that will help accelerate the technologies that support these communities of people with diverse speech patterns.
Frequently asked questions
Find answers to common questions about the Speech Accessibility Project.