I Sure Hope More than Hadoop Crosses the Chasm

Lots of the Hadoop vendors are thrilled with the uptake of Hadoop by mainstream IT buyers. While this is good news for the vendors, it won’t be good news for the businesses buying the technology unless more than just Hadoop crosses the chasm. For Hadoop to succeed, the businesses need technology that allows the business staff to be able to find important signals in Hadoop. Here’s what should cross the chasm along with Hadoop.

For Hadoop to succeed, businesses need technology that allows business staff to find and organize the data stored in Hadoop, prepare it for analysis, allow the prepared data to be shared, track the lineage, find important signals, display them in attractive ways, and then distribute the results to those who need it.

If you don’t have these elements, Hadoop becomes a platform that brings data in from lots of new and exciting sources and then allows a group of data scientists and database experts to create data pipelines that must fit into various methods for display and use in applications. The biggest problem with this structure is that the experts become the bottleneck. Even if you have a lot of them, you always have more people who are itching to get at the data.

For Hadoop to really cross the chasm, it must arrive on the other side in a form that allows the entire company to embrace big data, combine it with existing data, and then find and exploit its value. The rest of this article explains what that looks like.

What Is on Both Sides of the Chasm?

To really understand how to make the most of Hadoop after it has crossed the chasm, we need to take a look at what this chasm is and what is on both sides of it. For those of us who are serious about making an impact with IT either as a vendor or a buyer, Geoffrey Moore’s corpus of ideas have been transformational. Moore defined the idea of the chasm in high tech after looking at much older research about the adoption of agricultural technology. It turned out that when new technology arrives, a small but sophisticated group adopts it right away. This group of innovators and early adopters doesn’t need any help. They just want to know what a new widget does so that they can figure out if it will help.

The next and much larger group to adopt technology is the early majority. This group isn’t sophisticated about the plumbing of technology. They are not going to grab the widget and make it work. They want packaged solutions to urgent problems, which Moore calls bowling pins.

Moore’s message to vendors is that you must change the way you sell to cross the chasm. For the innovators and early adopters, sell the power of the widget. For the early majority, sell fully formed solutions that knock down bowling pins.

So, to really cross the chasm, Hadoop needs to be knocking down lots of bowling pins on the other side. Just setting up a cluster won’t do this. You need much more.

Hitting a Strike with User-Driven Innovation

Another less well-known but vitally important thinker about how to make the best use of technology is Eric Von Hippel, an MIT professor who came up with the concept of user-driven innovation. Von Hippel’s insight, which can be found in two books that can be downloaded for free (The Sources of Innovation and Democratizing Innovation), is that when users have the tools to construct solutions, a huge amount of innovation results.

So, in terms of our discussion of the chasm, we want not only data scientists and database experts to create solutions that knock down bowling pins, but we want to provide users the ability to create solutions on their own. If we can do this, a massive amount of innovation, created by a much larger team, will result.

What’s Happening Across the Chasm?

What we need on the other side of the chasm is technology that increases the number of people who can be involved in exploiting big data. The question is: How can you get that? What does it look like?

When some of the Hadoop vendors talk about Hadoop crossing the chasm, they are rightly pointing out that the early majority is indeed starting to purchase and experiment with Hadoop. But quite often this experimentation only leads to a few proof of concepts that then find their way into production. Remember, even the most sophisticated users of Hadoop, such as Netflix (see “Why Does Netflix Need a Hadoop Genie?”), have had to create ways of making using Hadoop simpler. Too often, Hadoop crosses the chasm and remains a science project, bottlenecked by the availability of experts who understand big data and the fiendishly complex Hadoop APIs.

But you can find happiness across the chasm and put big data to work widely. Companies that have succeeded in using Hadoop as a part of the way they do business are using technology like Platfora, which attacks several of the problems mentioned at the beginning of the article. Platfora is a full-stack big data discovery platform that sits on top of Hadoop and provides capabilities such as:

  • A data catalog to track what data resides in Hadoop.
  • A data prep environment to allow data to be cleaned up and readied for analysis.
  • A way to combine, reshape, and share data into usable objects called Lenses that can be used to support analysis.
  • Ways to access data outside of Hadoop.
  • Support for execution of data pipelines so that new data can be ingested, processed, and made available in cleaned and distilled form.
  • Access to statistical and analytics capabilities to help find signals in data.
  • Access control mechanisms that allow data to be secured.
  • An environment for creating dashboards and visualizations that present data and allow for exploration and collaboration.

While some of these capabilities are complex, the system as a whole allows Hadoop to thrive on the other side of the chasm. The data nerds and super-users can explore data and knock down bowling pins. The rest of the company can get access to these solutions through visualizations and dashboards. Data scientists and database experts can still do their noble work of creating advanced applications and data pipelines, but the results can be delivered into Platfora for wider use.

Platfora’s vision is all about completing the journey Hadoop is taking across the chasm. Without the types of capabilities Platfora provides or something equivalent, Hadoop often crosses the chasm and then dies of exhaustion. Nobody wants that. Better to have Hadoop cross the chasm and expose the power of big data to everyone who is ready for it.