In recent years, the widespread perception of Big Data as exhibited in buzzwords, the popular press and other areas is has been: Big Data is not just big…It is HUGE! Big Data is fast! Big Data is variable! This opinion has avoided, to a certain extent, more important questions about Big Data:
- How will the information in Big Data environments be used in a business context?
- What is the strategy for using the Big Data with existing systems?
- What is a reasonable sizing scope for Big Data?
These new business-focused considerations of Big Data challenge the current way in which we look at Big Data initiatives. To be honest, changing the perception of Big Data is not about creating a hard-and-fast definition of the size, shape and color of the Big Data “bread box.” It is more about how we look at the challenges in our businesses and industries and apply these new capabilities to meet those challenges.
Big Data is More about Business Requirements than about Technology
Business requirements are moving toward resolving how business wants to use Big Data solutions rather than allowing technical platforms to dictate use cases. The use of the technology platforms is more about how a particular use case in a Big Data initiative should be addressed rather than starting with a technology platform and working out what can be done within its limitations.
Gone are the days of technology environments dictating how business uses Big Data. Adoption will be driven based on how Big Data initiatives serve the top and bottom line. Businesses will focus on the mission of improving customer relationships or lowering costs through business process optimization. The missions will be accomplished by looking across not just data domains, but across business segments. How can operational application data relating to product and service provisioning help with manufacturing quality? How will sensor information from healthcare devices influence not just patient care, but long-term pharmacological development and medical treatment enhancement? How can voice data from customer care impact how sales approaches customers for cross-sell/up-sell opportunities?
The core of these questions is technology based – it is related to how businesses can apply Big Data solutions to business problems. When Big Data solutions are applied to these issues, new breakthroughs will be possible due to the removal of common barriers. Much like the sound barrier was eliminated as a limit to aircraft design, Big Data has the promise to remove barriers from business models that were once thought to be static in many industries – true optimization of transportation logistics, enhanced quality control in manufacturing, results based healthcare treatment – as well as continuing the development of new industries.
Big Data isn’t just Hadoop
The popular press would have you believe that the concept of Big Data is encapsulated in a single technology option – Apache Hadoop. From recent research from Enterprise Management Associates entitled Big Data Comes of Age), hosts of technical solutions are being used to solve Big Data initiatives. These solutions range from operational platforms generating event management and billing information to enterprise data warehouses and data marts managing historical structural data and analytics to NoSQL data stores (including Apache Hadoop) that manage multistructured data and access in ways that prove difficult for traditional data management solutions.
The study showed that an overwhelming majority of end users are implementing Big Data across multiple platforms. The users are using Apache Hadoop in conjunction with the enterprise data warehouse. They are using NoSQL with analytical appliances. They are using operational systems with data discovery tools. In each of these instances, end users are leveraging the right tool for the right task. Programmatic processing is moving toward NoSQL environments. Numerical analysis is probably staying with traditional SQL-based analytical solutions. Data management of multistructured data and structured information is moving toward the Hadoop HDFS. In each of these instances, end users are not simply ripping and replacing their existing technologies for Apache installations or the commercial distributions of Hadoop. Rather, solutions for Big Data are being crafted using a best-of-breed strategy that leverages performance attributes of existing platforms and the features of new technologies.
Big Data isn’t just Big
Big Data has always been portrayed as the upper-end of the data management spectrum – petabytes moving toward exabytes, zettabytes and yottabytes. For industries well connected to social media or machine-generated data, these data sizes are not out of the question. These industries are the implementation innovators, or “poster children,” of the Big Data story. Currently, for most businesses this amount of storage is bordering on unimaginable. Again, it is just as unimaginable as breaking the sound barrier once was for pilots and engineers.
According to recent Big Data research, the data sources being used by a significant number of Big Data initiatives are not petabyte-sized monsters. While these implementations are not small by any means, the most common range of Big Data initiatives is projected to be in a 13 to 40 terabyte range in 2013. While this is not a petabyte implementation, it is not a trivial amount of data to manage. This range of installation still requires that information technology professionals use datacenter “grade” technology tools. 20 terabytes under data storage will challenge most organizations and can definitely not be accomplished with a rack “hidden in the closet” or a server under a desk (at least not yet…).
Changing Face of Big Data in 2013
The industry is moving away from technical definitions to business characterizations. How do we look across our business to achieve that mythical 360-degree view of customers or the holistic view of our supply and cost chains?
A single platform will not dominate the landscape. Instead, solutions that handle Big Data initiatives will spread to multiple platforms. Three, four and five “node” environments will be the norm as opposed to a single solution. Each node will have its own processing or storage contribution to the business goal. In addition, a majority of these solutions will fall well short of extreme data volumes. Instead, the data stored will be more manageable but still large enough to provide technology professionals with an administrative challenge. All of these factors come together to alter our notion of Big Data in 2013.