July 19, 2018
For the past four years, we’ve taken the pulse of the data science community with our annual Data Scientist Survey. 2018, of course, is no difference. And while we confirmed a few things we anticipated–namely, that data scientists love their jobs and that more and more of their work is being used for AI projects–one thing we didn’t totally anticipate was the supremacy of open source machine learning frameworks above all else:
The results, as you can see, are pretty staggering. A vast majority of the most popular frameworks are open source. Why exactly, you might ask? We’ve got a few ideas:
1. They’re free!
Saying that open source frameworks are free is clearly an obvious statement, but it really does constitute one of the top reason why both individuals and companies often turn to them.
Investing in expensing licenses early on, before clearly understanding your use case, simply isn’t a reasonable approach. R&D teams appreciate the flexibility that comes with the ability to compare many different options and experiment with them with no strings attached until they commit to a specific framework. Also, research institutions and early stage startups, which constitute two of the most prolific centers of technological innovation, also share a need to choose the cost-efficient option available to them and typically end up relying on open source solutions.
2. Open source frameworks usually get to the market quicker
Open source software is usually faster to the market than proprietary software because its development is mostly agile in nature, in particular when it is written by single individuals or small groups of talented AI practitioners. There is no roadmap planning involved, no prioritization. It just “happens.” So there are usually multiple implementations for each algorithm or each technical problem that a Machine Learning expert needs to solve.
3. “Holistic support” often beats customer support
People naturally don’t associate open source libraries with the idea of high quality support, because, well, there is no support. In fact, don’t many such frameworks warn us that the results come with no guarantee, and to use them at our own risk.
However, open source code has this incredible advantage of gathering crowds of passionate developers and users around a common problem. The key concept behind open source is what could be qualified as “crowd-coding”, and also “crowd-testing.” Most importantly, even though all these users are facing similar technical challenges, they are often considering these challenges from a different point of view, in the context of a different use case or business problem. They are real-life users naturally become contributors interested in improving a same tool to solve their own real-life problems.
Consequently, new features are regularly added, and chances are, if you need a new functionality from the tool, others might also need that same improvements and might have actually already started to work on it. And because of this, open source software development might turn out to be more agile compared to proprietary solutions for which you would have to submit a feature request and hope that it gets prioritized by the maker of the tool you are using. Open source tools typically undergo holistic improvements inspired by real-life problems.
Similarly, because of the volume and the variety of the users of the software, open source code is being tested in real-life conditions as opposed to an engineering team in an office. In other teams, it is fool-proofed by others and goes a much more extensive testing phase that proprietary software would.
4. Cutting edge technology often originates in open source
If you enjoy reading research papers, you would probably know that many research groups create open source libraries attached to their work which they submit to peer review. That means that the most advanced topics of research (in particular those related to machine learning) are usually available in the form of open source code long before it makes it to the form of proprietary software.
It’s also worth considering that in the context of research, using proprietary software is a barrier to innovation, because its usage limits opportunities to share with peers from other institutions and work collaboratively with other research groups. By opposition, the usage of open source solutions helps advancing research and pushing the boundaries of human knowledge because they are typically more lightweight and experimental compared to “industrial” solutions.
5. They offer the ability to “mess with the code”
An important feature of open source code is that it is, well, open. This means that they offer users the ability to access, read and interact with the code to understand how it works and what it does, but also to switch from customer to developer as needed. Users have the ability to make code modifications whenever they may find it necessary. Open source libraries are easier to adapt to new use cases or combine with other open source libraries.
6. It has better accessibility for private users
Let’s face it: while a very large proportion of machine learning innovations come from individual contributors (either students, ML professional working on side projects, or other people with a passion with ML and AI), not all proprietary software is priced for private users. Similarly, it is not infrequent to see proprietary software that clearly is targeting enterprise users, and therefore a lot of users are not aware of it. In contrast, open source is typically well publicized and widely used. Which is all to say that data scientists often get familiar and comfortable with these tools early in their careers and can provide more value, more quickly using frameworks they’re well versed in.
7. It comes with less commercial limitations
Brand name frameworks sometimes come with attendant licensing issues and limitations that hamstring practitioners from getting what they truly need from a solution. Instead of having to triple-check if the license allows for the use they need, practitioners using open source platforms can not only feel free to experiment, but, as we mentioned in a previous section, they can probably find other users who have already done some fraction of the experiment already and build off those successes.
8. A sense of giving back and pride
Last but not least, there is a social aspect to open source frameworks. In just the same way that blogging has been used by an entire generation as a way to gain recognition and visibility in the community, contributing to open source code is a way for coders to gain credibility within their professional networks (it’s very frequent nowadays to see a links to GitHub repositories on resumes), stay connected with like-minded professional, and stay on top of the newest technical developments.
Finally, open-sourcing the code for a machine learning platform or a machine learning library can constitute a real strategic advantage for companies interested in being recognized for its thought-leadership in the domain of artificial intelligent and in attracting the attention of both talent and potential customers.
Put it all together and you can start to understand exactly why these open source frameworks are so popular. They provide some serious advantages and let practitioners experiment, both by themselves and communally, with other like-minded machine learning experts. And, of course, the price doesn’t hurt either.
To get the data behind data science, download our free 2018 Data Scientist Report.