January 1, 2018
We started our AI for Everyone Challenge with a simple goal: to give machine learning experts the tools and resources to create AI projects that contribute to the greater good. We’re blown away by the quality of submissions each and every quarter. Thus far, David Van Valen’s medical live cell work, KivaFaces equitable facial recognition project, Sarah Sternman’s literary creative style initiative, and Ani Nenkova’s medical literature database are up and running on CrowdFlower, building out and enriching the datasets that will make their projects a success. We’re thrilled with their progress and can’t wait to share the fruits of their labor when their research wraps up.
As we mentioned above, our Q4 submissions were fantastic and choosing a pair of winners is getting more and more difficult each go around. But we found two projects we’re especially excited about and wanted to announce the winners and give you a sneak peek about what they’ll be up to on CrowdFlower for the next year.
We’ll start with a group of researchers who are tackling an increasingly pervasive problem: hate speech. Plenty of smart people have tried to combat this problem, but a lot of those efforts fall short. They treat hate speech as a binary, independent of context, and that leads to misclassifications that make hate speech systems inadequate. A more precisely characterized dataset could have real benefits for society, allowing companies to flag and remove hate speech from their platforms, giving schools the ability to combat cyberbullying, and generally making the internet a safer place for everyone.
The team here is global and quite large. They all have extensive experience in natural language processing and hate speech in particular, and, as a cohesive team, they have a real chance to create the best, most actionable dataset of context-laden hate speech that can begin to truly combat the problem. They are:
- Grigorios Tsoumakas and Athena Vakali of Aristotle University of Thessaloniki
- Cristian Danescu-Niculescu-Mizil of Cornell University
- David Jurgens, Dan Jurafsky, Vinodkumar Prabhakaran, Rob Voigt,and Yulia Tsvetkov of U.Michigan/Stanford/CMU
- Henry Kautz of University of Rochester
- and Bert Huang of Virginia Tech
Our second AI for Everyone Challenge winner is Wei Xu’s LanguageNet team from Ohio State University. They’ll also be looking at natural language processing but instead of hate speech, she and her team will be looking at another difficult, but broader, linguistic task: large-scale paraphrases and sentence synonyms.
In other words, they’ll be building a database of expressions. Long, complex ideas can be stated in a variety of ways, but current NLP algorithms have a tough time understanding them. But building a corpus of synonymous phrases and ideas will be a big step forward for understanding (and summarizing or paraphrasing) language.
Wei’s team has made some great progress on their algorithms and methodology already. The problem? As Wei noted in her submission “deep learning algorithms are data hungry.” LanguageNet will use CrowdFlower to annotate more sentence and phrase data (in ten different languages!) to power their AI project. We’re really excited to see what the dataset looks like and will be overjoyed to share it with the NLP community.
So welcome to the AI for Everyone family to both of our winners! And if you’d like to apply for our next round, we’re taking submissions. Winners get up to a million CrowdFlower rows, a free platform subscription, $25K for contributor costs, and a whole lot more. We’d love to hear from you.