API-Centric Data Architectures
We at Apigee obviously believe in the power of APIs. But APIs are not just for our customers and users to build out. APIs have to be central to our product—in other words, everything we put out as a product must have an API. Perhaps not so obviously, APIs also must be the way we run our company.
We decided to put this to the test. Could an API-centric internal architecture enable us to run our company? The first thing to understand is that we run Apigee on metrics—metrics on our engineering, product usage, customer satisfaction, defects, developers—on practically anything that makes Apigee run.
Our support and sales processes run on Salesforce.com and we run our engineering on JIRA. There is a linkage between support tickets and Jira. Our product’s cloud deployment is obviously independent of Salesforce.com (it runs on Amazon).
The first challenge we had to solve was the key map between these two systems. Once we built an API layer that did that, we had clean data that was uniform and common across two systems that were not naturally linked. Any API-centric data architecture needs some key mapping. Lucky for us, we have built exactly such a system: Apigee Edge. The key map is just a state that Apigee Edge maintains.
The second challenge we faced was when to do bulk load and when to call into the source system. It is obvious that for complex analysis, one needs data in—guess what?—a database system. So we wrote our own crawler that got all the relevant information from the source systems using our data APIs. So any data-centric architecture requires some mechanism to convert APIs that return some atomic information into bulk loads. We did this by writing a program in Python.
Of course, we could not fully depend on bulk loads—new data elements and even APIs get added every day. This is a particular strength of leveraging APIs, which are often versioned, over more traditional extract, transform, and load (ETL) processes (more on this in an upcoming blog post). ETL will stay, but will also need to transform to be more API-centric.
Net net, given that new systems (definitely in the SaaS model, but many enterprise systems, too) are being exposed through data APIs, there is going to be an emergence of API-centric data architectures, which, over time, will become the dominant data architecture. When the data you are correlating is coming from APIs, it’s only natural to have the mapping(s) exposed via APIs as well, which enables you to pull the data and the mapping from the same type of interface.
—Jeff West and Ben Tallman on Apigee's product team contributed to this post