Advances in information technology over the past five decades have been nothing short of breathtaking. While this offers tremendous opportunities, it also creates some difficulties, for computing on a vast scale generates data at rates faster than can be managed, understood or analyzed. Which is why, though storage costs are going down every year, many large companies are experiencing increased total storage costs. One large financial-services company, in fact, saw its data stores grow from four to 40 petabytes in just the last two years.
Welcome to the “Big Data” era. In many ways, big data is a new frontier connecting consumers and companies, from which communications and activity can be mined to deliver personalized, relevant offers and messages, all executed with unprecedented speed, automation, and intelligence. The opportunities are vast.
Experienced CIOs see this opportunity in context. They know that leveraging big data to deliver real business results will require a focused strategy that leverages and protects their existing data assets, develops new capabilities that are production-ready and reusable, and is able to manage the deluge of new data that will be created in the process.
For many companies, the recent explosion in data is not a result of increased business transactions or better use of information and analytics. Rather, it’s the result of unmanaged replication. Large email attachments that are broadly distributed, hundreds of extracts from production systems sent nightly to departmental managers, and unclear archive-and-purge processes, all that drive data growth without necessarily creating any new information. The value of big data comes almost exclusively from new information and insights, not copies of existing data, and there are three main ways in which to get started down the right path.
The first task is to separate the signal from the noise. First, begin reducing the noise by locking down and simplifying the data environment with information lifecycle management, data governance, and master data management.
Second, it is critical to identify (even broadly) what new information and insights big data can provide and how that will impact the business, your business. Case studies illustrate this in action. Some of the actions you can take include:
Third, define the smallest possible scope for success. Be rigorous in defining the new information that is needed, and then decide if big data is the only source. If it is, then assess the smallest set of data required to generate that information. Ask questions such as: How much history is needed for trend analysis? How granular is the data needed? For example, for discovery and analysis projects, statistical relevant data samples can produce the same insights as can full-volume historical data sets. Most large companies try to understand patterns in customer behavior and product performance so they can optimize their business processes and performance. An analysis of 500,000 random phone subscribers will yield just about the same insights as 50,000,000. Unless your business can take advantage of micro segmentation, a rigorous sampling and analysis process will yield sufficient actionable insights.
Networked, dynamic business processes built at a very granular level can produce billions and trillions of bytes of data each month. Given all this, it must be understood that the demands of big data have traditionally outstripped any improvements in technology cost/performance. Fortunately, new architectures and approaches have evolved over the last decade that can simplify managing these enormous data volumes, approaches that are finally being incorporated into the enterprise architectures of many large companies. These include:
Big data must be considered in the context of the enterprise data and analytics environment: capturing and creating data, cleansing and organizing it, mining business insights from it, and using those insights to drive intelligent alerts and actions in the business. By feeding data that measure the outcomes of these actions back into the system, a closed loop is created that allows companies to use their data to test, learn, and improve potential scenarios.
The diagram below depicts three broad domains of the ecosystem: data, insight, and action.
Big data presents opportunities and challenges. Data management leverages big-data technology to eliminate redundancy and provide scalable infrastructure for managing big-data assets. Insight uses appliances and accelerators, while NOSQL technology, and automated analytics to expose new value hidden in big data.
Big data presents fascinating opportunities for insight and innovation—as well as the challenge of separating the signal from the noise. Increasingly, companies are overlaying their internal, proprietary data with insights from external structured and unstructured data to better understand their customers, performance, and marketplace. New technologies are making big data useful and manageable, but careful, business-driven planning and governance are essential to success. Starting from clear business objectives, enterprises are evolving to manage the dramatic growth in data, harvest new insights, and continuously optimize their actions.