11436 SSO

Big Broad Data: The role of Data APIs

May 08, 2012

In previous posts on this topic of Big Broad Data, we looked at some of the reasons for and implications of enterprises shifting their focus from the “bigness” and technology hype of “Big Data” to breadth and diversity, signal extraction, analytics and deep insights.

The future is around the easy consumption, the flow and interaction of data, which drives a revolution in the world of Data APIs. The structure of the Data APIs becomes increasingly important.

Building an Information Halo around APIs

Let’s consider enterprises as one of two types from a data perspective: those for whom data is the core business and those who give data away to attract increased transactions to the core of their business. 

In the latter case, the data (information) itself is not necessarily monetizable, but it attracts people to the business.

For a great discussion about the notion of information halos around your core business, see Sam Ramji’s talk: Amundsen’s Dogs, Information Halos and APIs: The epic story of your API Strategy.

In both scenarios, data is a fundamental and critical part of an API strategy

Enterprises who are monetizing around data are beginning to plant flags in different domains. Weather, finance, real estate, Internet traffic and dozens or hundreds of other domains are forming.

The people building the data in any domain are doing so by collecting from disparate data sources. To build out any one domain, they’ve probably stitched together data from a large number of data sources, cleansed and standardized it before finally exposing it as an API.

What are companies using to collect and stitch the data?

A natural and familiar stitching technique is the linked data model (linkeddata.org). While linked data techniques are excellent at accessing individual data elements, I argue that this is not the model that these data providers need. Instead they need to crawl, bulk load, and access data in large quantities, before cleansing, standardizing, and delivering it.

If linked data is not the most effective method to stitch data together to create the domains (at the bottom of the stack), can linked data become the de-facto standard to express data out of the information halo (at the top of the stack and as the Data API for domains)?

The answer is probably yes – eventually. I think it is an unlikely scenario just yet. Today, the challenge is how to cleanse, standardize, unify and use the data in individual domains. Linked data techniques have the right characteristics to bring together data that have already been cleansed, standardized, and stitched but is not a great model to do the initial stitching. It will most likely become useful and common in the future when the inter-linking of domains becomes more important than it is today.

If linked data is not the approach to expose data as APIs, what is?

The school of thought to which I subscribe is one of schema-based access to data APIs patterned after relational models. Here are a few examples of data APIs, which highlight three common kinds of data access patterns.

* Primary key lookup - to get to a specific data element.
* Imposed hierarchy-based lookup - in which you have classes with hierarchy and in effect traverse the hierarchy to get to the data elements.
* Rectangular lookups - defined by typical relational lookups of rows and columns.

All of these techniques are being built around single data sources as opposed to massively linked data sources.

The structure and “RESTification” of  Data APIs

There are several approaches to Data APIs. In addition to the perspective that is Pragmatic REST for SQL Developers, there’s Microsoft’s OData approach. As I asserted in a recent talk at an OData Meetup event at Microsoft, OData is a step in the right direction but there are certain things OData needs to do to become the de-facto standard.

The “RESTification” of the Data APIs is a fundamental imperative and both the Pragmatic REST for SQL and the OData approaches are good starting points.

Whatever the solution is, it cannot be vendor specific. Data is too important, and the data revolution too fundamental for it to be associated with any one vendor.

OData technologies need to be available in all ecosystems, not just in the Windows Foundation Classes (WFC) library and the .NET Framework. Similarly pragmatic REST and other techniques cannot be available in Apigee or any other single vendor offering only.

Let the call to action be to come together as a community; get the best of the linked data and OData ideas and techniques together and transform the world with Data APIs.

The conversation has already started and we’d love to hear more of your perspectives and arguments over on api-craft.

Scaling Microservices