FAQ on using our data

Answers to questions we often get from researchers interested in using Speech Accessibility Project Data.

What is the goal of the Speech Accessibility Project?

Its goal is to empower people with impaired speech by helping companies and universities to improve automatic speech recognition.

What is included in the data distribution?

Each data release includes speech recordings and metadata. The metadata include the prompt text to which the participant was responding, and a text transcription specifying what the participant actually said (see the annotation guidelines for more details about the transcription standards). The metadata also include differential diagnostic patterns of dysarthria for some of the waveforms: these are perceptual scales rating the intelligibility, breathiness, strain, naturalness, etc. of the speech file, as judged by a speech pathologist. The differential diagnostic patterns will eventually be available for fifteen waveforms per participant, though early data releases may not include 15 rated files per participant.

What is not included?

Participant age, gender, race, ethnicity, geographic region, and self-assessed conversational disability survey results are available in aggregated form, but are not provided for individual participants.

What types of speech disability are represented?

Phase 1 of the project (April 2023 through November 2023) recruited only people with Parkinson’s, so all data releases prior to June 2024 will only include the speech of people with Parkinson’s. Phase 2 of the project (November 2023 through August 2024) is also recruiting people with Down syndrome, ALS, cerebral palsy, and stroke, so data releases beginning July 2024 will start to include data representing those etiologies.

What do the speakers say?

Speech samples are of three types. First, computer commands are read sentences, designed to mimic utterances used to get information from a digital assistant. Second, novel sentences are extracted from Project Gutenberg novels, simplified in some cases to make them more readable, and read by the participant. Third, spontaneous speech samples are the response of the participant to a question about culture or daily life.

How is data quality assured?

At least one human annotator has listened to every distributed waveform. Waveforms with no speech have been removed from the distribution. Waveforms distorted by recording equipment are not distributed if the annotator judges that the distortion significantly reduces the intelligibility of the speech.

How is participant privacy protected?

All researchers must sign a data use agreement. Among other terms, the data use agreement guarantees that researchers will not seek to identify participants, will protect the data from theft and will delete the data contributed by any participant who withdraws consent if so directed by the project staff.

How long is a typical recording?

A typical recording is one sentence, although some recordings may include one word or multiple words or utterances. It varies from speaker to speaker.

How often will you refresh the data?

We'll refresh the data monthly, with a full new set uploaded each time. This ensures that anyone who has withdrawn their consent will not be included in the available data.

How quickly might I expect to hear back once I submit my proposal and signed data use agreement?

Within a week, we'll confirm we received your proposal. Fully executing the signed data use agreement on our end may take a few more weeks.