11436 SSO

Big Broad Data: Increasing the signal to noise ratio

May 08, 2012

In my previous post about Making the shift from Big to Broad Data, I made the case for thinking about Big Data not so much as “Big” but as “Broad.” We looked at the explosion of new data sources in today’s economy, which are individually typically smaller and more diverse than the enterprise systems of record of the past. Data comes from a variety of sources like Twitter, Facebook, partners, tens and hundreds of apps (some built around your APIs), and more. 

To be responsive and make business decisions, an enterprise simply has to be responsive to data across many more sources than in the past.

Signal extraction and stitching data from diverse sources

Whenever you collect a lot of data, you collect a lot of both signal and noise. In fact, a lot of the Big Data approaches to date are focused on extracting the signal from the noise in the data collected from traditional enterprise sources.

As the data an enterprise collects has shifted from 5 to 7 primary "enterprise-centric" data sources (point-of-sales data sources, supply records, customer records, warehousing records, and so on) to hundreds of diverse and typically smaller sources (hundreds of apps, hundreds of social networks, business networks, and so on), the size of the data doesn’t matter nearly as much as the number and diversity of data sources you need stitch together to extract meaningful signal. 

Now with the footprints of customer and partner behavior spread across hundreds and maybe thousands of data sources (see Making the shift from Big to Broad Data), there's undoubtedly a smaller signal/noise ratio than before. Data need to be stitched together to ensure that the signal rises above the noise.

Successful enterprises will be those who understand and consolidate the data that they own as well as that which they can acquire. Today, the challenges are striking deals and forming partnerships to get access to hundreds, not handfuls of data sources. It's no longer about old styles of purchasing data or getting them from departments in your own organization.

Enterprises need to think about access and control; whether they need to push analytics to the data or push data to the analytics, and so on. In the past, bringing data to a central repository in the enterprise - whether a warehouse or other techniques - where it could be analysed was a less significant concern.

Data APIs will lead the way for easy data consumption, flow, and interaction

Fundamental questions are What are the mechanisms enterprises need to stitch data together from its own enterprise as well as syndicated and external data sources?

How will people interact with other people’s data? (We’ve got to understand the form in which this data will be exposed and how it will be consumed.)

In Web 1.0, techniques centered around Web crawling; in Web 2.0 it was about Web pages, AJAX and other "rich" interface technologies. I think that in Web 3.0, broad, diverse data will be accessed, consolidated, and correlated through the power of APIs.

Today, transactional and data APIs are the Ying and Yang of APIs and the API conversation is fairly dominated by transactional APIs (which achieve tasks like sending messages, making trades, getting credit information, and so on). But I think we’re looking at a revolution in the world of Data APIs because the future is around the easy consumption, the flow and interaction of data. As APIs are central to the evolution and handling of big, broad, diverse data, so is data central to the evolution of APIs. The structure of the data APIs becomes increasingly important.

In my next post, we'll take a look at some of the schools of thought around structure of Data APIs. We'll explore the different techniques companies are using to collect and stitch data to create individual domains and to expose data out of those domains. See The role of Data APIs »

Scaling Microservices