Big Broad Data: Beyond the "bigness" and the technology to extracting meaning
The amount of data in our world has been exploding, and the concept of “Big Data” - collecting and analyzing large data sets—needs no introduction. It’s the buzzword of 2012 where IT is concerned.
There's been a focus on the business side of Big Data, which of course is a critical component of the discussion. Big Data is most certainly the next frontier for innovation, competition and productivity (McKinsey Global Institute, 2011).
However, a quick Google search, a track of #bigdata in your Twitter feed, or 5 minutes in a conversation with folks about "Big Data" will show you how the weight of the discussion focuses primarily on two things - first on technology and then on "bigness".
While both are important, I think that the focus on the technology and size is misplaced and causing us to miss the point that the depth of analysis of the data and the insights we get from them are the most important and valuable things.
“What’s your tool set?”
Hardly a conversation happens in the big data space that doesn’t start with the pros and cons of Hadoop, No SQL, Cassandra, Hbase . . . the list goes on. Technology is of course extremely important because without it we couldn't determine the signal over the noise or handle large data sets. But the technology is almost commodity. (And of course, trying to get two of us technologists to agree on a technology is a whole different discussion.)
"How big is your data anyway?"
Right behind the technology argument is the “Bigness” – the petabytes vs. terabytes argument. There are certainly technical complexities to dealing with petabytes of data. But terabytes and even kilobytes are big and more importantly they too hold valuable information.
Remember that a lot of the size will come from noise whether you’re dealing with kilobytes, terabytes, or petabytes. Big, noisy data is not valuable - the value will come from the signal that you can extract.
To successfully glean value from big data, we've got to pivot the discussion to focus on the breadth of the data, signal extraction, and deep insights. This should make us think about the areas above or below the technology and not on the technology itself. Bottom line - the data itself is the real gold – the new currency.
The disruptive technologies of social, mobile, and cloud that are transforming how we do business serve up the breadth of data. Data about a business' customers is available and interpretable in all kinds of new contexts. A customer that checked in at the gym on Foursquare before visiting a retailer is likely to be interested in sports stuff. You can imagine hundreds of similar examples.
What's a good example of value from extracting signal over noise? A Klout Score uses data from social networks to measure reach and influence. It is a signal extracted from a superabundance of tweets and other social interactions.
Deep Insight is about how people can take the output of the machines and convert it into business value. We might come to know that shopping cart abandonment is higher from apps on Android devices than on iPhone devices, indicating that Android apps are less persuasive.
There’s also a fundamental change for businesses because of the apps and API economy that compels the need to focus on breadth of data and data stitching from disparate sources.
I’ll talk more in upcoming posts about "broad" data and "data stitching" as well as how Data APIs will lead the way in the exploding apps and API economy. We also discussed these topics in a Webcast last week. (video and slides here)