11436 SSO

A Blueprint for Designing Complex Architectures

Feb 04, 2015

Having dealt with a complex system architecture while working at Vodafone, helped large customers that run Apigee solutions on premises, and helped the Apigee engineering team on our own deployment as a SaaS solution for our customers, I’ve identified many recurring architecture principles to deal with the complexity of such systems.

These principles are not rules, laws, or perfect truths; they’re statements on the order of “an apple a day keeps the doctor away.” They give a name to a concept, and make it easier to talk and reason about it. They provide a place to hang the feelings we have about good and bad design.

They attempt to translate those feelings into concrete advice. In that sense, the principles are a kind of anodyne. Given some code or design that you feel bad about, you may be able to find a principle that explains that bad feeling and advises you about how to feel better.

More importantly, these principles are heuristics: common-sense solutions to common problems. But like any heuristic, they are empirical in nature. They have been observed to work in many cases; but there is no proof that they always work, nor is there any proof that they should always be followed. As part of our continuing effort to share what we've learned building products and running operations, we’ll examine two of these four principles here.

Principle #1: Design for security

Security should always be treated as a first-class citizen in every design and deployment:

  1. The owner of the data and the data lifecycle must always be identifiable and traceable

  2. The requestor of the data must be identifiable and its actions must be traceable and auditable

  3. The multi-tenancy level (infrastructure, OS, app, service) must be clearly defined and deployed for each data element manipulated by the system

  4. Self and delegated administration must be part of the original design

Security must be understood as an activity on each system component. The security defense needs to be practiced in-depth and built to contain intrusions. Don’t take a medieval moat approach—think more about modern individual house protection.

Employing security at the API level using well-known mechanisms and protocols (OAuth and OpenID Connect for example) and a multi-tenant platform is generally a very good way to start.

Principle #2: Don't ignore latency and fault handling

Especially when dealing with cloud resources, it’s important to manage failure and not assume that latency will be zero:

  1. Latency (network, software component) must be managed (don't attempt to eliminate it); multi-level predictive caching (close to the consumption) must be in place to accelerate response time

  2. Every component that implements an API must have a health dashboard

  3. Move from "best effort" to "managed" by knowing that a system element is down before trying to use it

  4. Avoid unnecessary intermediaries and aggregators by employing explicit addressing

  5. Latency and fault handling are the responsibility of each system component

Current system architectures differentiate READ operations from CREATE, UPDATE, and DELETE and make READ operations as direct as possible, thus minimizing latency and fault.

Coming up in part two, we'll address two more architecture principles: the importance of designing for the cloud, and why you must instrument and log everything.

Photo: Jack Dorsey/Flickr

Scaling Microservices