The Human-in-the-Loop leader combines machine learning and human intelligence to create the high quality, large scale, structured data necessary to make AI work in the real world
SAN FRANCISCO, May 10, 2018 – Figure Eight, the essential Human-in-the-Loop artificial intelligence platform for data science and machine learning teams, today unveiled at their annual conference Train AI the new offerings to accelerate the adoption of AI by more businesses: Figure Eight Datasets, Video Object Tracking, and Smart Bounding Box Annotation capabilities.
Figure Eight Datasets is a free, curated repository of versioned open-source training data which the industry needs for benchmarking and advancing critical machine learning deployments and research. Today, the repository launches with the publication of eight originally developed and researched training datasets constructed from millions of human-generated labels. The inherent value of high-quality, publicly-available training datasets of this size and kind is unique, and also includes unusual but critical transparency of methodology. This transparency comes in the form of raw display of the Figure Eight human-in-the-loop workflow’s original settings and logic that produced the curated, validated data set. The initial eight Figure Eight Datasets are:
Open Images Dataset V4 (Bounding Boxes)
A set of 1.7 million images, annotated with bounding boxes for 600 classes of objects, served in collaboration with Google.
Medical Images for Nucleus Segmentation
21,000 nuclei from several different organ types annotated my medical experts.
Transcriptions of 400,000 handwritten names for Optical Character Recognition (OCR).
San Francisco Parking Sign Detection
Parking sign detection and parsing from images of San Francisco streets.
Medical Speech, Transcription and Intent (English)
A collection of audio utterances for common medical symptoms.
Medical Information Extraction
A dataset of relationships between medical terms in PubMed articles, for relation extraction and related Natural Language Processing tasks.
Multilingual Disaster Response Messages
A set of messages related to disaster response, covering multiple languages, suitable for text categorization and related Natural Language Processing tasks.
Swahili Health Translation, Speech, Transcription and Topics
A collection of health-related audio recordings in Swahili created in collaboration with Translators Without Borders and the Red Cross.
“Today data scientists struggle to find high quality, relevant benchmark datasets for testing their Machine Learning algorithms, as most existing datasets are limited in the languages and cultures that they cover,” said Robert Munro, CTO of Figure Eight. “These eight training data sets were selected because they represent real-world problems that are important to solve, or they are tough Machine Learning problems. The Figure Eight Machine Learning team selected these datasets from a broad range of candidates. We think it will enrich the Machine Learning community to have more datasets to work on, and it will enrich the world for those datasets to help make AI in the real world available to more people.”
Figure Eight Video Object Tracking allows machine learning teams to annotate an object within a video frame and then have that annotation persist across frames within the video, still ensuring that every frame is accurately reviewed by a human where high quality annotation is required. This object tracking capability is essential to annotate video content at scale in applications such as autonomous vehicles, security surveillance and media entertainment. Without the object tracking capability the cost and time required to annotate individual frames in video will be prohibitive and make AI applications that need to understand objects moving through time and space untenable. Video is a growing data format with over 500,000 hours of video uploaded and 1 billion hours of video consumed on YouTube every day. Figure Eight Video Object Tracking is available now as a private beta and shall become generally available to all customers in Q3. For customers who want to sign up as a private beta customer, they can register here.
Figure Eight Smart Bounding Box Annotation allows machine learning teams to leverage the power of Deep Learning to accurately identify objects in Computer Vision applications. The Figure Eight Smart Bounding Box Annotation capability comprises two new features: Predictive Bounding Boxes and Intelligent Bounding Box Aggregation.
The Predictive Bounding Boxes feature greatly reduces the human effort to identify, label and draw bounding boxes around objects in images. High-confidence bounding boxes are created by the Figure Eight deep learning model instead of human annotators, with the human annotators able to confirm, adjust, or remove a Predictive Bounding Box to ensure that it correctly labels an object of interest. The Intelligent Aggregation feature addresses the problem of how to determine the ‘correct’ bounding box when multiple people have drawn bounding boxes around an object that are slightly different from each other. Optimized over millions of past bounding box jobs on the Figure Eight platform, Intelligent Bounding Box Aggregation addresses this problem using Deep Learning Computer Vision combined with expertise in quality control for human annotation. Intelligent Aggregation uses the bounding boxes created by any number of humans, the past history of those people’s accuracy, and the image content itself, to create a single, optimized bounding box for each object. The bounding box placement is accurate down to a single pixel, allowing Figure Eight customers to have the most accurate possible human-driven object detection for their Computer Vision models. The Figure Eight Smart Bounding Box Annotation capability is available now as a private beta and shall become generally available to all customers in Q3. For customers who want to sign up as a private beta customer, they can register here.
About Figure Eight
Figure Eight is the essential Human-in-the-Loop AI platform for data science and machine learning teams. The Figure Eight software platform trains, tests, and tunes machine learning models to make AI work in the real world. Figure Eight’s technology and expertise supports a wide range of data types – text, image, audio, video – and use cases including autonomous vehicles, intelligent chat bots, facial recognition, medical image labeling, aerial and satellite imagery, consumer product identification, content categorization, customer support ticket classification, social data insight, CRM data enrichment, product categorization, and search relevance. The Figure Eight platform operates at an unprecedented scale having generated over 10 billion data labels to power AI applications.
Headquartered in San Francisco with a presence in Tel Aviv and backed by Canvas Ventures, Trinity Ventures, Industry Ventures, Microsoft Ventures, and Salesforce Ventures, Figure Eight serves Fortune 500 and fast-growing data-driven organizations across a wide variety of industries.