Data can be messy. To a machine, a picture is just a series of pixels until a person draws a box around an object and identifies what exactly those pixels mean. The same is true for pretty much any kind of data: a machine might be able to define words in a sentence, but has trouble understanding the purpose and intent of that word string without some human input.
The process of annotating or labeling that data is called human-in-the-loop. And it’s absolutely essential to creating the training data that makes machine learning work in the real world.
Figure Eight’s platform is built to harness human intelligence at scale to create exactly this kind of data. And no matter what kind of data you need annotated, we have an approach that will work for you:
Business process outsourcers (or BPOs) are groups of dedicated contributors who can handle difficult or sensitive data. They create training data in secure locations and can operate under NDA for confidential data.
We also have a variety of channels with individual areas of expertise. Whether you’re looking for specific language fluencies, geographic locations, or skillsets, we can help unlock the right contributors for your project.
Our leveled contributors come from all over the globe. We keep track of every data row contributors annotate, making sure the best contributors can see the most complicated projects and that your training data is the most accurate it can be.
Some of our customers prefer to use the Figure Eight platform with their own, internal contributor base. Figure Eight works exactly the same in this arrangement, except customers handle any payment to their own internal contributors.
The human-in-the-loop approach combines the best of human intelligence with the best of machine intelligence. Machines are great at making smart decisions from vast datasets, whereas people are much better at making decisions with less information.
For example, people are great at looking at a complex image and picking out discrete entities: “this is a lamppost” or “that’s a cat but you can only see its tail.” This is the exact sort of information a machine needs to understand what a lamppost or a cat looks like. In fact, a machine needs to see a lot of different lampposts and cats–from different angle, partially occluded, in different colors, etc.–to understand what one looks like. A robust dataset of these labeled images (i.e. human intelligence) teaches a machine to see those items (i.e. machine intelligence). And at some point, with enough data and enough tuning, those machine algorithms can see and understand images incredibly quickly and incredibly accurate without the need for people to constantly tell it what exactly a cat (or a lamppost) looks like.
– For training: As we discussed above, humans can be used to provide labeled data for model training. This is probably the most common place you’ll see data scientists use a HitL approach.
– For tuning or testing: Humans can also help tune a model for higher accuracy. Say your model is unconfident about a certain set of decisions, like if a certain image is in fact a cat. Human annotators can score those decisions, effectively telling the model, “yes, this is a cat” or “nope, it’s a lamppost,” thus tuning it so it’s more accurate in the future.
Active learning generally refers to the humans handling low confidence units and feeding those back into the model. Human-in-the-loop is broader, encompassing active learning approaches as well as the creation of data sets through human labeling. Additionally, HitL can sometimes (though rarely) refer to people simply validating (or invalidating) an output without feeding those judgments back to the model.
HitL can and is used for manifold AI projects. This includes NLP, computer vision, sentiment analysis, transcription, and a vast amount of other use cases. Any Deep Learning AI can benefit from some human intelligence inserted into the loop at some point.