The industries, tactics, and promise of computer vision, explained
Computer vision is a catch-all term for the techniques and methods enabling computers to gain a high-level understanding from images or videos. Colloquially, computer vision aims to teach machines how to see–and react to–the world.
The history of computer vision is intriguing because its evolution might be the most powerful testimony that high quality training data is the single most important element in the creation of powerful AI systems.
Even though the field was first pioneered in the late 60’s, its progress was hampered for the several decades now commonly referred to as the AI Winter. Then, in 2009, researchers from Princeton University presented a new open source labeled image dataset named ImageNet at the prestigious CVPR computer vision conference. By 2012, after years of intensive crowdsourcing efforts, ImageNet had become the largest open source dataset available for Computer Vision specialists, with a current number of labeled instance of 14,000,000. From that point in time, better computer vision algorithms have been developed at a frantic speed. For just a single data point, classification errors on the dataset have dropped from 28% in 2010 to less than 3% in just 5 years.
Put simply, the creation and open-sourcing of ImageNet paved the way for computer vision advances in myriad fields, specifically because it contained 14 million high-quality, well-labeled images.
Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, as well as the extraction of high-dimensional data from the real world in order to make informed decisions. We’ll discuss those in detail below.
The goal of training data in computer vision is to label or annotate an image or video so that a machine can understand other, unlabeled images or videos. Here are a few ways that’s commonly done:
While computer vision models and applications are appearing across more and more industries each year, a handful of domains have been traditionally associated with computer vision:
Whether it’s home security, satellite imagery, or drone videos, computer vision holds the promise for a safer, smarter world. A few examples of computer vision for security and surveillance:
Medical computer vision largely centers around microscopy imagery, x-ray, MRI, and CAT scans. This domain is especially promising because so many of these images exhibit a uniformity that in turn means computer vision models generally require less images to train an accurate model. Some examples:
Computer vision is an important ingredient to making autonomous vehicles work in the real world. Here are a few examples of work flows and annotation jobs that we support:
The most common use case here is optical character recognition models automatically reading or parsing PDFs for relevant information. The main challenge is locating which parts of the document are important and extracting text from those areas. Typically, this is done by fingerprinting documents with bounding boxes over the crucial information, then transcribing that information. When enough of this training data is created, a model can identify the areas of interest and transcribe that text.
You can learn a little more about how we help Workday train, test, and tune their enterprise-grade OCR models here.
There are plenty of promising use cases in the retail and consumer packaged goods domains. Generally, the center around better understanding of product catalogs or logistics (namely, keeping products on shelves). For example: