There are numerous ways to compare Hadoop, the up-and-coming open source data analysis platform, to Linux, the open source operating system that’s widely used in enterprises and data centers.
Although Linux was, at one time, a curiosity backed by a dedicated core of developers and early adopters, it’s now used pretty much everywhere but the desktop. The jury is still out on Hadoop. Designed to make it easier to work with large collections of data in a distributed computing environment, it is clearly in its early stages of adoption, and many people are still wondering if there’s anything useful to be made of all the fuss.
But many companies are deploying Hadoop in order to learn what they can do with it. It’s part of the larger narrative around “big data” we hear so often, which says that useful business information can be gleaned from digging through large troves of data and simply asking the right questions. Hadoop’s role is making it easier to ask those questions and extract the answers.
Hadoop itself, which is an open-source project run by the Apache Foundation, sprang out of Yahoo. The core team from Yahoo that created it went on to start Hortonworks, a company that aims to build a commercial business by supporting companies deploying Hadoop. If the business model sounds vaguely familiar, then you might be thinking about Red Hat, the company that was created two decades ago around Linux and that is now on track to do about $1.5 billion in revenue this year.
There are signs Hadoop is getting its hooks firmly inside the enterprise. A recent IDC survey of 202 large companies already experimenting with Hadoop found that 32 percent had moved it to production environments, and another 31 percent are in the process of doing so within 12 months. As progress goes, it’s notable, but we’re still a long way from Linux-like levels of adoption.
Today, Hortonworks and Red Hat announced what they called a “strategic alliance” that includes an integration of products and joint go-to-market strategy. But they also share something else. Benchmark was an early backer of Red Hat, and was also a founding investor in Hortonworks. So far Hortonworks has raised just shy of $100 million in three rounds of venture capital funding from Benchmark, Yahoo, Index Ventures and Tenaya Capital.
Benchmark partner Peter Fenton led the firm’s founding investment in Hortonworks with Yahoo; when he was at Accel Partners, it invested in JBoss, an open source software firm that went on to be acquired by Red Hat. Fenton sees a lot of parallels between Red Hat and and Hortonworks. Put simply, if Hadoop grows the same way Linux did, then Hortonworks has a huge opportunity as companies take it from an experiment to a critical part of their business. Fenton and I talked about this recently. Here’s a lightly edited version of our conversation.
Re/code: Peter I think a lot of people are still getting their heads around what Hadoop is and why people like you see such a big opportunity for it. Walk me through it.
Fenton: We were one of the early investors in Red Hat. And the common theme across all of our investments is that there’s a democratization of technology that’s occurring, where we’ve gone from what’s being sold top-down through enterprise sales teams, who are in many ways trying to hide the software. Open source has this premise that says, “Let’s get rid of all this economic waste that occurs in software adoption and put that into a product experience that literally sells itself.” The users of the software are now empowered to see if the software is good or bad rather than having to rely on a salesperson.
So then the economic question is similar to what people wondered about Red Hat and Linux: How do you make money on open source software, which is free?
The logic is, to our minds, quite simple. If a company has decided they are going to build a new data architecture, there is great economic opportunity to be a close partner with them as they do that. So the line is to drive and support. If it doesn’t work, who are they going to call? Support is in the baseline offering. But there are a lot of things that fit around and on top of Hadoop that create opportunities that are not yet served by the open source community. Things like identity management and core security. They’re not core to Hadoop and that allows for Hortonworks and its partners to create value around it.
So what phase do you think we’re in now? Is it still an early-adoption phase or are we getting to a point where we’ll see those killer applications?
No question we’re in the early-adoption phase. There are rewards to companies that are making early investments in figuring out ways to use Hadoop to create a competitive advantage. The genes of Hadoop — MapReduce — were the basis of the competitive advantage at Google. But Hadoop is being used in the most important areas of other companies because the stakes are so high. They are willing to make the investments into making sense of it and making it useful. The case studies right now number in the hundreds — they’re not yet in the thousands.
So there are several companies playing in different angles of the Hadoop ecosystem, and I think some people get confused about them. The one everyone knows is Cloudera, then Hortonworks, and there’s also MapR. Is there enough room in the ecosystem for all of them?
I think for Hadoop to work, it has to first and foremost succeed as the open source Apache Hadoop. I don’t think it can succeed as a product from a commercial vendor. Our investment in Hortonworks is founded on the premise that, like Linux, Hadoop can achieve a level of standardization that will create an ecosystem that everyone can work in. And, secondly, it will unlock repeatable application sales. With Linux — versus all the flavors of Unix — the industry got behind a common approach that was non-proprietary. The common foundation of Linux toppled Solaris and HP-UX and IBM AIX. They didn’t go away, but if there had been 10 flavors of Linux, it wouldn’t have happened that way.
One thing that I’ve never fully understood is the difference in approaches between Cloudera and MapR and Hortonworks. How do you explain it simply?
The difference is a commitment from the founders of [Hortonworks] to build everything that they do, and be public about it, in Apache Hadoop. They contribute 100 percent of their code base to Apache Hadoop with zero proprietary extensions. Cloudera or MapR build their own point products that are expansions and extensions, and their own Hadoop distributions. That is a meaningful difference. The spiritual reason that Hortonworks exists is that for the company to win, Hadoop has to win. It’s much like Linux had to succeed before Red Hat could succeed. And you have to get the industry to get normalized around the platform. That hasn’t happened yet with Hadoop, but there are signs that it is. Linux achieved success because it wasn’t linked to the success of any one company. If you look at the reasons that the companies exist, this is a black-and-white difference between them.
So you’ve made the case that it’s possible to make money with Hadoop. Is Hortonworks making money or at least on its way to making money?
We’re not ready to disclose it yet but the revenue momentum is as significant as any investment I’ve made in my career. I mean that in the tens of millions of revenue. And since it’s open source, you might ask how is that possible, but there are partner relationships. Then there are customers who want support contracts because the data they have in Hadoop is so important to them. … Whether or not we can build or create business or not, the burden is still on the company to do that over many years.
As I understand it, some other companies you’re involved with are using Hadoop.
EBay, which is an older investment, has one of the biggest implementations of Hadoop on Earth. Twitter has a massive implementation. They have been public about it on their blogs, and it’s the backbone for much of the main product, but also part of the operational systems. This is a fundamental shift in data architecture. I can’t remember a time when one of the startups I’ve invested in has told me they were negotiating a big license with Oracle. Ten years ago we heard that all the time. We don’t anymore. What we hear now is that they can’t hire Hadoop experts fast enough. To me that sounds like a tectonic shift that’s getting more visible by the day.