There is a convergence taking place in cutting-edge applications: The massive firepower of big data processing systems to distill enormous amounts of data is being combined with real-time information. The next generation of applications won’t be real-time or batch; they will be both. The challenge is how to normalize the results of these types of information so that the business user gets something meaningful.
I believe that Opera Solutions has discovered the design pattern that will allow the unification of batch and real-time processing based on the concept of a signal. I've gotten to know this architecture while working on a research project for Opera. This pattern should be highly instructive to CIOs and CTOs seeking to create data-processing and analysis applications for real-time data.
The Basic Structure
Here is the basic structure:
- At the bottom of the stack you have hundreds or even thousands of data sources. Some are processed in long batch runs; others are processed in real time.
- The data volumes are such that both types of data must be analyzed using machine learning and other techniques; they can no longer be presented directly to users. A cadre of machine-learning experts is required to create this Signal layer.
- Both of these layers spit out important distillations of the information that are called Signals. The Signals are distilled and refined predictions that can vary widely in scope and complexity.
- The Signal layer becomes the raw material for analysts, who use these Signals, evaluate their quality, and create models to gain insights.
- Analysts and app developers build applications on top of these models, which can accept input from any type of data quite naturally.
Based on its experience building hundreds of such systems as a consulting firm, Opera Solutions has created a commercial software platform that consists of reusable components that can build Signal Hubs™ and Signal Products. The platform allows many products to be built off of a collection of Signals—the valuable information derived from patterns and anomalies extracted from raw data.
What is a Signal?
Enterprises collect data in a data warehouse or a distributed file system such as Hadoop. Most organizations then build statistical models from the raw data each time they have a specific problem to solve. But the sheer volume of data can blunt the effectiveness of that model, creating a very high “noise-to-Signal” ratio. What most organizations really want to do is transform data into information they can actually use. The Signal is that transformative item.
Signals are generated when a pattern in data is recognized and presented to someone who can use it. When a meaningful threshold is passed, an inference can be made. In some discussions, the word “events” is used in a similar way as signals.
- For example
- “This is the fourth time this person who lives in the United States has run the same small transaction in Bulgaria within an hour. Perhaps this credit card has been created from a phished account.”
- The key to unlocking such inferences lies in correlating the Signals with targets that are meaningful to the business, and then building a statistical model that can find the patterns again, applying machine learning to future patterns so that the model constantly improves. From these models, applications can be fed with processed, focused Signals that trigger appropriate actions by business users. In the above example, that would probably be “Block future transactions, quarantine all suspicious activity, and notify user of fraud detection.”
“At the end of the day, you’re usually going to say, ‘Look, I have a particular metric that I’m going to use to determine the ultimate business impact of what I’m doing.’” says Joseph Milana, Global Head of Analytics at Opera Solutions. “You start experimenting, asking, ‘how well did this particular transformation impact this metric?’”
It turns out this is easier said than done, not only because of the immensity of data now being generated in real time, but also because most of the products built to mine data were constructed for the use of a technologist, rather than the business user, says Arnab Gupta, CEO of Opera Solutions.
Machine Learning has Captured the Attention of Business Leaders
“Adopting core big data technologies is no longer the priority of the CIO; it’s the priority of the business leaders, creating an enormous challenge for many of the infrastructure companies who built [prior solutions],” Gupta says. “The historical method was to give you the software, and data was a do-it-yourself task within the corporation. But with big data, extracting value, or what we call ‘Signals,’ has become so complex and difficult that industries have emerged to organize and create standardization around the algorithms that extract value from data.”
In other words, with so much data, there must be some machine-learning or advanced technology to make sense of it and make it usable by the business.
The business intelligence model of previous generations no longer applies because too much expertise is required to make data optimally useful, Gupta argues. Part of the problem is that many such products are made in a “product-up” frame of reference, meant to dazzle people with the power of the technology to collect a tremendous amount of data in a “reservoir” format.
But Gupta says Opera Solutions’ Signal Hub is built from a “customer-down” perspective, solving a customer problem in an intuitive way that also creates a repeatable solution that can be given to others in the organization or sold to other organizations. Also important, the Signal Hub collects data from multiple sources at once.
- Here’s an example
- One of the most vexing problems for many businesses is predicting attrition and devising a solution to stop or slow it down. Suppose a company that runs online video games in one line of business determines that a 20 percent reduction in monthly gaming activity increases the likelihood of a gamer quitting the platform by 40 percent. Perhaps the Signal Product that predicts attrition might be applicable to predicting attrition in the company’s (or someone else’s) video-rental business. The data inputs and structure of the product won’t be identical, but the basic skeleton has already been built and can easily be altered without returning to the raw data, or to the “square one” of an algorithmic design process. The model is repeatable, adaptable, and—possibly—saleable.
Wealth Managers Harness Signal Layer
Opera Solutions has already created numerous Signal Products out of tens of thousands of Signals for some of the toughest algorithmic customers around, such as Morgan Stanley, who uses Opera Solutions to provide recommendations for customers based on their preferences, the contents of their portfolios, the movement of the markets, and the financial advisor’s preferred information sources and investment types. The wealth-management Signal layer consists of customer and financial advisor data, as well as market data, so that actionable Signal Products now inform the applications wealth managers use.
“In other words, on Wall Street, we converted big data into ‘small data,’” Gupta says. “Because a similar Signal feeds many apps, you increase your ability to get scale in terms of app building.”
Opera Solutions automated and industrialized the Signal discovery and deployment process through its Vektor™ platform, which streamlines the construction of Signal Hubs, which bring together a reusable collection of Signals that can then be used in a wide range of Signal apps.
Whether or not you end up working with a company like Opera Solutions, the idea of a Signal layer and a Signal platform is a powerful concept for organizing the next generation of applications.