What is Computer Vision?

The industries, tactics, and promise of computer vision, explained

Computer vision models are changing the world of business for myriad different industries. Learn a little bit more about how–and why–these models work.

What is computer vision?

Computer vision is a catch-all term for the techniques and methods enabling computers to gain a high-level understanding from images or videos. Colloquially, computer vision aims to teach machines how to see–and react to–the world.



Why is labeled training data important for computer vision applications?

The history of computer vision is intriguing because its evolution might be the most powerful testimony that high quality training data is the single most important element in the creation of powerful AI systems.

Even though the field was first pioneered in the late 60’s, its progress was hampered for the several decades now commonly referred to as the AI Winter. Then, in 2009, researchers from Princeton University presented a new open source labeled image dataset named ImageNet at the prestigious CVPR computer vision conference. By 2012, after years of intensive crowdsourcing efforts, ImageNet had become the largest open source dataset available for Computer Vision specialists, with a current number of labeled instance of 14,000,000. From that point in time, better computer vision algorithms have been developed at a frantic speed. For just a single data point, classification errors on the dataset have dropped from 28% in 2010 to less than 3% in just 5 years.

Put simply, the creation and open-sourcing of ImageNet paved the way for computer vision advances in myriad fields, specifically because it contained 14 million high-quality, well-labeled images.


What are some common computer vision tasks and methods?

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, as well as the extraction of high-dimensional data from the real world in order to make informed decisions. We’ll discuss those in detail below.


What are some common ways to get training data for computer vision models?

The goal of training data in computer vision is to label or annotate an image or video so that a machine can understand other, unlabeled images or videos. Here are a few ways that’s commonly done:

  • Image classification or categorization: Here, images are labeled broadly. An annotator may see an image with a cat in it and simply label it as such. Much of ImageNet was labeled in this manner.
  • Bounding boxes: Annotators draw bounding boxes over a given object, multiple instances of objects, or several object classes in a single image. Effectively, this identifies the pixel groupings that are “pedestrians” or nuclei” in an image so a machine can learn those configurations.
  • Polygons: Similarly, for instances in which objects can not be neatly boxed, polygons provide a more exacting frame for individual object instances.
  • Lines: Frequently used for things like identifying driving lanes, a line tool is a simple annotation that can teach machines about boundaries or other relevant image attributes.
  • Dots: Dots are often used to mark extremities or specific areas of interest on an image. This can be the corners of the eyes and mouth for facial recognition, the ends of forks or other objects in robotics, and more.
  • Semantic segmentation: Semantic segmentation–a.k.a. pixel labeling–involves the conscientious labeling of objects at a pixel level. Though it often takes the longest time for annotators, it produces the most exacting labels for computer vision.
  • Video object tracking: Here, labelers place boxes on images in specific frames so a model can learn to track objects in time in space. Our tool combines human and machine intelligence to speed up the process by up to 100 times, predicting and persisting labels from the first frame onwards. You can see how we tackle this process in the video below:

What industries and use cases leverage computer vision?

While computer vision models and applications are appearing across more and more industries each year, a handful of domains have been traditionally associated with computer vision:

Security & surveillance:

Whether it’s home security, satellite imagery, or drone videos, computer vision holds the promise for a safer, smarter world. A few examples of computer vision for security and surveillance:

  • Annotating drone images to train models to detect faults in infrastructure
  • Using satellite images to estimate crop yield or predict environmental changes
  • Bounding boxes on static cameras for home or business security


Medical computer vision largely centers around microscopy imagery, x-ray, MRI, and CAT scans. This domain is especially promising because so many of these images exhibit a uniformity that in turn means computer vision models generally require less images to train an accurate model. Some examples:

  • Pixel-level semantic segmentation on nuclei
  • Dot and shape annotations on X-rays
  • Video object tracking for live cell microscopy

Autonomous vehicles:

Computer vision is an important ingredient to making autonomous vehicles work in the real world. Here are a few examples of work flows and annotation jobs that we support:

  • Bounding boxes on crucial classes like pedestrians, trucks, bikes, street signs, and, of course, other cars
  • Instance-based pixel-level semantic segmentation for similar ontologies
  • Line tools used to classify lanes and lane types

Document transcription:

The most common use case here is optical character recognition models automatically reading or parsing PDFs for relevant information. The main challenge is locating which parts of the document are important and extracting text from those areas. Typically, this is done by fingerprinting documents with bounding boxes over the crucial information, then transcribing that information. When enough of this training data is created, a model can identify the areas of interest and transcribe that text.

You can learn a little more about how we help Workday train, test, and tune their enterprise-grade OCR models here.


There are plenty of promising use cases in the retail and consumer packaged goods domains. Generally, the center around better understanding of product catalogs or logistics (namely, keeping products on shelves). For example:

  • Image classification for search relevance tuning (i.e. marking product facets like color or style)
  • Bounding boxes on products or empty shelves to train computer models to understand in-store catalog and availability
  • Video object tracking for enhanced optimization of factory processes

eBook: What We Learned Labeling 1 Million Images

From photos of earth from space to cellular microscopy and everything in between, we’ve seen our fair of images come......

Read More

Making a computer into a super-recognizer

In London, there is a team of half a dozen police officers known as “super-recognizers.” They have an uncanny ability......

Read More

The Essential Guide to Training Data

A machine learning algorithm isn’t worth much without great training data to power it. At Figure Eight, we’ve been providing......

Read More

We’ve helped some of the most innovative companies in the world train their computer vision models. Reach out and we’ll let you know how we can help your organization.