About a year ago the teams at CrowdFlower and Microsoft Azure Machine Learning got together to discuss a shared vision. That shared vision was to take Machine Learning into the business mainstream so that tens of thousands of companies could deploy intelligent applications. As the teams discussed this shared vision, it became apparent that we each held different pieces of the puzzle, and that collectively we could make this vision a reality faster together.
Today we’re proud to announce the general availability of the joint solution “CrowdFlower AI powered by Microsoft Azure Machine Learning” which you can check out at ai.crowdflower.com. We co-developed this solution together because we knew that to achieve the shared vision of mass adoption of Machine Learning within businesses we first had to address the practical reality of the barriers preventing this today.
Barriers to adoption of Machine Learning
The media coverage of AI and Machine Learning has gone mainstream. We’ve seen articles in The Economist and Vanity Fair, we’ve seen emotional stories about Tesla Autopilot and the threat of AI to mankind by such luminaries as Stephen Hawking, and we’ve even seen Dilbert make jokes about Artificial Intelligence and Human Intelligence.
But the media coverage does not accurately reflect how Machine Learning is being adopted outside of the technology elite (the Amazons, Apples, Facebooks, Googles, Microsofts, Teslas and Ubers of the world who can throw massive resources at billion dollar problems such as self-driving cars, AI assistants and autonomous drones). For Machine Learning to become commercially viable within mainstream businesses, we needed to address two main barriers to adoption.
Barrier 1: Lack of High Quality Customized Training Data
Machine Learning Models need training data. Without training data the model cannot learn. It’s like buying a car and there are no gas stations. You’ve just bought an expensive lump of metal that cannot go anywhere. Machines cannot create the training data themselves. You need human intelligence to create the initial training data from which the model can learn, find patterns and make predictions.
So the first barrier we had to address was the lack of training data. The generation of customized high quantity, high quality training data is our core competence. Our human-in-the-loop platform has generated over 2 billion human judgments for text, images, video and audio training data sets for leading data science teams.
There are 3 important aspects you need to consider when creating your training data so your Machine Learning model can learn. First, you need your training data to be customized. You define how you want the data collected and structured. The way you want to classify your support tickets will be specific to you. The business rules which define a Level 1, 2, 3, 4 or 5 are different for you than other companies. Second, you need a high quantity of training data so that there are enough data points for the Machine Learning model to learn about the different classifications and possible outputs. Third, you need high quality training data. Humans make mistakes. So you need a methodology and a platform that enforces that methodology to deliver high quality from humans. The CrowdFlower platform delivers on all three of these training data needs – customization, high quantity, high quality.
Having the training data and the Machine Learning model capabilities in a single platform means you can deploy a Machine Learning model faster. But that then leads to the second barrier to adoption.
Barrier 2: Machine Learning Models Failing Safely
Imagine a Data Scientist going to the VP Customer Support and saying “I have a machine learning model that is right 70% of the time. I think we should deploy it into production for classifying our support tickets and stop using humans.” The VP Customer Support will laugh at the Data Scientist and say “I cannot afford to be wrong 30% of the time. So I can’t use your model.”
So how can companies move beyond this impasse? The solution is an approach called Human-in-the-loop where the model handles predictions where it’s confident but hands off predictions for human review where it’s not confident. If you deploy Machine Learning without Human-in-the-loop then you are saying you have 100% confidence in all the predictions from the model. If you do this, you will have avoidable bad outcomes.
Earlier this year, Facebook faced criticisms that its Trending feature was surfacing news stories biased against conservatives. In response to this criticism the company fired all the human editors for Trending, replacing them with an algorithm that promotes stories based entirely on what Facebook users are talking about. Within 72 hours, according to the Washington Post, the top story on Trending was a fake story about how Fox News icon Megyn Kelly was a pro-Clinton “traitor” who had been fired. A human-in-the-loop approach would have prevented this obviously flawed outcome.
If you apply Human-in-the-loop then you have started to automate a business process. Initially that business process – say classifying support tickets – is 100% human. Then with Machine Learning model handles the high confident cases which is maybe 10-20% of the volume but humans still handle the vast majority because the model is not yet confident enough. Over time the model continues to ingest new training data – the human output – and becomes more accurate and more confident, so the percentage of work done by the model increases. Additionally, the volume of work that can be handled by this semi-automated process dramatically increases.
This approach dispels one of the core myths about AI being promoted in the mainstream media. That myth is that AI is about machines replacing humans. Machine and humans have complementary strengths. AI is about the art and science of combining the strengths of Machine Learning and human intelligence. This was the core principle behind the development of CrowdFlower AI.
AI = TD + ML + HITL
So we’ve launched the joint solution “CrowdFlower AI powered by Microsoft Azure Machine Learning” to address these barriers to adoption. Now for the first time in a single platform you have training data (TD), Machine Learning (ML) and human-in-the-loop (HITL) workflows.
So why should you care? Is this a case of 1+1+1 = 3? Or 1+1+1= 30? We believe it’s the latter.
This is why you should care.
First, the time to creation of a first Machine Learning model has been reduced. In a matter of hours you can generate the customized training data and create a Machine Learning model. That you used to take weeks and months. Now it’s days.
Second, you can now deploy that Machine Learning into production with the Human-in-the-loop safety net. This means you get to the business benefits of higher volume and lower costs without sacrificing quality sooner.
Third, we’ve changed the startup costs to apply AI to your business. With this joint solution we’ve brought the starting price point down to below $100,000. This means it’s now commercially viable for 10,000s of companies to apply AI into their core business processes such as categorizing support tickets or understanding customer sentiment in social data. Previously AI was only available to companies with the ability to invest $10 million to get started.
Today marks the beginning of the journey to drive wide scale adoption of Machine Learning. We’re thrilled to be starting this journey in partnership with Microsoft. To learn more about whether this solution applies to you check out ai.crowdflower.com.