Medical Speech, Transcription, and Intent (English)

8.5 hours of audio utterances paired with text for common medical symptoms.

About this data

This data contains thousands of audio utterances for common medical symptoms like “knee pain” or “headache,” totaling more than 8 hours in aggregate. Each utterance was created by individual human contributors based on a given symptom. These audio snippets can be used to train conversational agents in the medical field.

This Figure Eight dataset was created via a multi-job workflow. The first involved contributors writing text phrases to describe symptoms given. For example, for “headache,” a contributor might write “I need help with my migraines.” Subsequent jobs captured audio utterances for accepted text strings.

This dataset contains both the audio utterances and corresponding transcriptions.

View Instructions and Job Design

This job is part of an audio utterance collection workflow. By clicking “Duplicate Job” above, you’ll be given the template for the audio portion of this workflow. In other words, you will need a list of short text phrases in order to collect audio utterances. To recreate the entire job flow, please contact Figure Eight.

Preview Job

Raw Data

This input data consists of symptom prompts. Human contributors based their text phrases on these prompts, which were then used to collect audio utterances later in this workflow. The “Data” tab above contains further information and data on the audio recordings that were eventually made from these prompts.


Download Options | 2.3 GB | 160.2 MB | 137.7
recordings-overview.csv | 1.7 MB