Figure Eight Datasets

Datasets curated on the Figure Eight platform. Free to download for the entire data science and machine learning community.

The template used to annotate each Figure Eight Dataset can be duplicated so you can expand them on our platform. Inside each Dataset, you’ll find the raw data, job design, description, instructions, and more.

Open Images Dataset v4 (Bounding Boxes)

A set of 1.9 million images, annotated with bounding boxes for 600 classes of objects, served in collaboration with Google.

View dataset
Medical Images for Nucleus Segmentation

21,000 nuclei from several different organ types annotated by medical experts.

View dataset
Handwriting Recognition

Transcriptions of 400,000 handwritten names for Optical Character Recognition (OCR).

View dataset
San Francisco Parking Sign Detection

Parking sign detection and parsing from images of Francisco streets.

View dataset
Medical Information Extraction

A dataset of relationships between medical terms in PubMed articles, for relation extraction and related natural language processing tasks.

View dataset
Medical Speech, Transcription, and Intent (English)

8.5 hours of audio utterances paired with text for common medical symptoms.

View dataset
Swahili Health Translation, Speech, Transcription, and Topics

10.5 hours of disaster and threat-related audio data, categorized and translated from English to Swahili.

View dataset
Multilingual Disaster Response Messages

A set of messages related to disaster response, covering multiple languages, suitable for text categorization and related natural language processing tasks.

View dataset