About Us | Login | Follow CITO Research:

Designing a Scalable and Agile Big Data Platform

Graphic of servers

CITOs are now facing an infrastructure build out to support big data. How can they avoid the mistakes of the past and end up with an environment that is at the same time enterprise quality but also agile? If you examine the way the software industry has evolved, there are two currents of technology that are used by businesses.

There is the “enterprise-class technology,” designed to be used by thousands of people to solve problems, and which has a goal to be the “system of record” for the business. For that reason “Enterprise” software has to be surrounded by mature capabilities for management, administration, configuration and extension, so that people know the application will always be working properly and will be available.

On the other hand, there has always been a thread of technology we will call “agile,” which is on a smaller scale, and usually used by individuals or small teams to solve a problem directly.

The spreadsheet is perhaps the ultimate agile technology, although with the consumerization of technology, this group is expanding, so that it is now easy to build your own blogs, web sites, wikis, and to use consumer-friendly applications, such as Salesforce.com.

Context and Background

In the software industry there is a current along both lines. Under the “Enterprise” heading, there are systems of record, such as Enterprise Resource Planning (ERP), and large systems for Business Intelligence (BI). There are also “agile” ERP and BI solutions.

Software companies are well aware of this distinction, and their ultimate goal is to offer enterprise-quality applications that are as “agile” as possible. Every software vendor has ways of adding and configuring extensions to its product.

When these methods are controlled by a preisthood, a bottleneck ensues. When these mechanisms can be used by an end-user to meet their own needs, a product becomes more agile.

This background is interesting to keep in mind when one observes the “big data” arena. There are vendors that clearly offer enterprise class solutions, such as EMC’s Greenplum, Teradata’s Aster Data and Cloudera.

Others, such as Splunk, 1010data, and Pervasive’s Data Rush are “agile big data” solutions.

Agile Big Data solutions become even more powerful when combined with other agile technologies, such as QlikView, Tableau, or TIBCO Spotfire. The ability to sift through big data and then play with it in an highly interactive visual environment is a powerful combination.

All of this big data technology has been created with an awareness of the challenges of enterprise technology in mind, and has ambitions to be both enterprise-class and agile.

The challenge for CITOs mirrors that for vendors: How can you create a big data infrastructure that is both scalable and enterprise quality, but also is agile, and allows you to unlock the creativity of your work force, preventing them from being trapped in the bottlenecks typically created by enterprise solutions?

In order to understand what kind of big data infrastructure to build, it’s important to understand what kind of value will accrue by building it. As your organization first confronts big data, it makes sense to use Agile Big Data technology to rapidly understand what business value you can find in the steadily growing pool of big data.

Once big data becomes important to your organization, it may makes sense to install an enterprise-quality solution, just as a small company often upgrades from Quick Books, to SAP Business One, to an fully realized ERP system. But then the question is, how can you make sure such a solution still has agile capabilities?