From ETL to API – A Changed Landscape for Enterprise Data Integration
A trend is emerging in which businesses are deprecating ETL (Extract, Transform and Load) - based integrations and replacing them with APIs.
What are the drivers? Why are leading enterprises making this shift for their data?
There’s a fundamental shift in the qualitative nature of today’s data and an explosion of new sources.
Traditionally data was controlled within the enterprise – all of the data that an enterprise gathered were collected when partners and customers interacted with a small number of internal systems. However, in the new apps-based economy, in addition to the systems of record, there are new dynamic and disparate sources of data at the edge of the enterprise. (See Making the Shift from Big to Broad Data.)
When internal enterprise systems were comprised of a handful of core applications, the data produced needed to be concisely specified and optimized for efficiency of storage. Expensive ETL systems were necessary to de-dupe, cleanse and normalize it. However, because the nature of the data changed rarely (business systems were well understood, well modeled, and stood the test of time for decades), such investment was typically justified.
The mobile and apps economy means that the interaction with customers happens in a broader context than ever before. Customers and partners interact with enterprises via a myriad of apps and services, and unlike the traditional systems, these new apps, their interaction patterns, and the data that they generate, changes very rapidly and in many cases, is not even in the control of the enterprise that needs that data. Traditional ETL does not and will not cut it in this space.
So something different is needed. And APIs is the difference. Systems and teams producing data produce APIs for accessing the data. Systems and teams that need to consume the data use those APIs for consuming the data and do the right thing.
One might be tempted to say “Aren’t we ducking the hard problems that ETL solved – all the “Transform” – standardize, dedupe, cleanse, normalize?” No - because these problems are fundamentally much easier in the new world. Yes, all are still needed. But they are solved differently, and often much more simply. Why? Because the data that is now exposed by APIs has been optimized for consumption, as opposed to optimized for storage, which was the primary driver for the legacy apps whose data is consumed through ETL infrastructure.
What’s an example of this data optimized for consumption? People today provide information to be consumed by other people. They are more explicit about their identities meaning that you have much more complete personas from different sources than you had in the past. For example, you’ll likely have full email addresses for your customer in different contexts and not just the 8 characters that might have defined your customer in the old mainframe days.
One might be tempted to say, “Federation has failed, ETL has won. Therefore saying that all apps will do whatever they need to do through APIs is saying that the world of federation will make a comeback. Isn’t that stupid?” No, because federation failed on legacy data, where a significant amount of “T” needed to be done, and could not be done in a purely federated model.” We are in a new world, and in this new world, an API based approach will work.
Ok, but aren’t there patterns that occur frequently, requiring the pure API approach to be replaced by some in-the-cloud ETL?
It is doubtful, but of course, there will be cloud delivered services that can take some inputs and deliver new standardized values. And also, ELT (load and then transform) might become common when pure API based approaches do not work, especially because the cost dynamics of the ELT systems are likely to be very different than the ETL systems of the past.
Finally, one might ask: “Centralized ETL departments understand all the needs, and provide common services. Isn’t that necessary in a large organization to prevent the balkanization of data efforts?”
Yes, but it has to be traded off against the speed. The centralized service that defines an ETL system is costly and resource intensive and introduces friction in development cycles. An API model eliminates the middle layer, improves speed of development, and has economic benefits.
In an API-based model, producers have direct visibility into how their data is used. APIs are exposed directly to the consumers of those APIs who write whatever transformations are necessary. Capabilities once provided by ETL technology can be done in the apps or don’t need to be done at all.
ETL capabilities were designed during the days of the mainframe when we needed to minimize information because of high storage costs. Today’s data is designed for efficiency of consumption rather than efficiency of storage and are better served by API-based integration.
New shapes and forms of data require new capabilities. Traditional ETL integrations are expensive and also too rigid for rapidly changing data and a dynamic ecosystem driven by fast-paced mobile and social apps. An API-based model reduces the friction between the producers and consumers of data.
Successful enterprises are responding to the data revolution by replacing costly and rigid ETL systems with flexible API-based integrations. They’ll save money by simplifying their infrastructure and improve the speed at which they can consume and analyze the data that drives their business.