There are plenty of ways to annotate images for computer vision projects. At a high level, you can simply bucket images into classes, draw tight bounding boxes around objects in images, dot the corners of important entities, or label every individual pixel in a given image. Different approaches work for different initiatives, of course, but we’ve seen an increase in pixel level semantic segmentation over the past few years.
Now, generally, pixel labeling is done by class. For a self-driving car project, a class might mean “pedestrians” or “cars” or “billboards” or any other entity you need your algorithm to understand. The idea is that if you show your model enough pedestrians, cars, and billboards, the model will start to understand the characteristics of each. It learns what makes a pedestrian a pedestrian by ingesting copious examples, eventually creating its own framework of–for lack of a better word–pedestrian-ness or car-ness or billboard-ness.
But depending on your use case, that can be a problem. To a car, a billboard is a billboard is a billboard. One billboard behaves the same as the next, in other words. A self-driving car really just needs to know that it’s a stationary object it can ignore (as opposed to a street sign, which it might need to understand). If billboards are overlapping, it doesn’t particularly matter. They shouldn’t factor into how the car drives. After all, they’re just ads. (Except ours, of course.)
Now, cars and people are a different idea altogether. They move. Sometimes they move erratically. They’re part of the same class in a most semantic segmentation ontologies, but, depending on the models you’re building, that can be problematic; a mother pushing a stroller is going to behave a lot differently than a jogger, for example. Moreover, since a lot of these objects will overlap, simpler ontologies can result in confusion for computer vision classifiers. Take a picture like this:
This is a well-labeled image from our annotation tool. The edges are crisp and the classes are accurate. But all those cars are simply labeled as “cars.” And since they overlap, some algorithms can struggle with that information. After all, this isn’t a block-long automobile caterpillar. It’s a series of individual, parked cars.
We had a lot of our clients reach out and ask us to help them solve for this problem. So we did. Now, instead of the car caterpillar, our tool powers instance-based semantic segmentation. The output looks something like this:
Put simply, individually labeled cars reduce model confusion. We’ve seen marked success for teams behind, yes, self-driving cars, but also in areas like microscopy imagery, where cells in mitosis can be labeled as discrete entities. Instance-based labeling does indeed take a bit more time, but for a lot of enterprise-grade annotation projects, the extra care in labeling can pay big accuracy dividends and create more successful, observant algorithms.
If you’re interested in taking our instance-based pixel tool for a test drive (pun intended), please don’t hesitate to reach out. We’re quite proud of it and we’d love to see if it can help test, train, and tune your computer vision machine learning projects.