API design

API Design: Stability Versus Readability—Must One Choose?

Good API design separates APIs that merely expose assets from those that help developers get things done. As I’ve written before, and as we’ll explore in this article, good design includes the style in which web API URLs are constructed.

Below are two API URLs that exemplify two divergent schools of thought on URL style. The first example is an anonymized and simplified version of a real URL and the second is a theoretical URL:



Some major differences are obvious:

  • The first URL is opaque, providing enough information for a human to infer that the URL references a bank account at ebank.com, but nothing else. For those without a photographic memory, the URL details will be difficult to remember and difficult to distinguish from other, similar-looking URLs.
  • The second URL is much easier to interpret, memorize, and compare with other URLs. It tells a clear story: Los Angeles Plays Itself is a film in the documentary genre, and it is listed, along with other films in other genres, among the offerings on the fictional Cinema Canon website.

Because the second example is much friendlier to humans and because APIs are products used by human developers, it may seem that the hierarchical style is preferable. This is not always the case, however.

Conitnue reading this article on Medium.

API Design Best Practices & Common Pitfalls

Webcast replay and Q&A

The job of an API is to make the application developer as successful as possible. When crafting APIs, the primary design principle should be to maximize application developer productivity and promote adoption.

So what are the design principles that help optimize developer productivity? I recently presented some ideas about this in a webcast (you can watch the replay here).

I was pleasantly surprised by the discussion that my talk sparked; many interesting questions were asked, so I thought I’d share some of them (and my attempts to answer) here. 

(Editor's note: Some questions were edited for clarity)

How does HATEAOS fit into pure “HTTP” APIs?

I have seen different interpretations of what HATEOAS means in the context of APIs. One interpretation leads to the practice of trying to describe all the actions that can be performed on a resource in the representation of the resource. For example, if I did a GET on http://etailer.com/orders/1234567, then, in addition to describing the order itself, the JSON that I got back would try to describe all the actions I could perform on that order (cancel the order, re-order the goods, track the shipment, ...).

This is not commonly done, and I myself do not design APIs that work this way. The JSON I design only describes the order itself, including any relationships it has to other entities expressed as URLs. I make the assumption that it is the job of the client code to know what actions it wants to perform and how to perform them [using standard HTTP methods, of course]. This is how most clients are written in practice. Even the modern browser works this way, since operations are now usually coded in Javascript executed in the client rather than using old-school HTML forms prepared on the server.

Does this mean that I am violating the HATEOAS constraint of REST? I'm not sure, but I don't see a reason to worry about it. I don't make any claims for my APIs regarding REST compliance; I simply try to use HTTP as simply and directly as I can and avoid all invention where HTTP already specifies a solution (which in my experience it usually does).

Can you speak about API layering (experience APIs vs others)? How do you avoid "spaghetti APIs" over time?

I'm not fond of layering in general, although I recognize that you sometimes need to do it. It is common for companies to have some sort of "generic" API for their problem domain, and end up layering other APIs on top of it. For example, assume I'm in retail and I have APIs for catalog and orders. The mobile app team looks at my API and decides they don't like it for mobile development, so they put another server in front of the generic one that implements its own API and delegates onto the generic one.

So now which of the two APIs should others use? Once a few teams have done the same thing, it's not clear anymore which is the real API, if any. Some people make a virtue out of this (hence the concept of experience APIs), but I like to minimize layering. If the mobile team needs function that the generic API does not have, they can extend the generic API (possibly in their own server), but ideally they should not create a new layer on top.

How can one indicate specific error conditions when no HTTP response code is a good fit, or is not fine-grained enough?

I pick the HTTP response that is the closest fit, and also return a body with more information. I don't know of an accepted standard for the body format, but standards have been proposed, e.g. https://tools.ietf.org/html/rfc7807

How do we manage resource mapping and up to what depth it's good to go—like /users/{Id}/orders/{oid}/articles?

What you are doing here is inventing a query language. This query says something like "SELECT * FROM articles, orders, users WHERE user.id = $1 AND order.userID = user.id AND article.orderID =$2". This query is not optimal if {oids} are unique across all orders, because then the query can be reduced to just /orders/{oid}/articles. Designing your own query language is hard, which is one of the reasons that GraphQL has got attention.

Personally, I don't like the idea of encoding queries in the path portion of a URL rather than in the query string, because it encourages people to confuse query with identity lookups. But many people do what you are doing, so I won't claim it is wrong. I also used to do it before I had a change of heart.

For POST, can you speak to the pluses / minuses of query string and JSON body?

Putting queries in a query string and using GET to evaluate the query is attractive and a good fit for HTTP. An example is GET /pets?{your query here}. Unfortunately there is a practical limit on the size of a URL: if you go above about 4k characters you run the risk that some proxy in the chain between a client and the server will mess up the request or reject it. Because of this, I always offer both GET and POST options for the same query.

Would it be a generally decent design if we just expose a single API point (e.g. me.com/api) for all communication and use JSON as the payload?

Endpoint is not an HTTP concept, but it is fine if all your well-known URLs (and even all the dynamically-allocated ones) begin with the prefix me.com/api. The important thing is that every URL should identify some resource. If you have any URL for which you could not easily answer the question "what resource does this URL identify, and therefore what would it look like if I performed a GET on it", then you are probably not working within the HTTP model.

I understand the limit on linkability rationale for eliminating a version number in a URL, but what would your strategy be then for handling version differences?

See this blog post on versioning.

How important is it to strive for the consistency of the API design in terms of resource/entity planning, error message standards, header extensions, etc?

There is a nice quote from Fred Brooks on this:

“Blaauw and I believe that consistency underlies all principles. A good architecture is consistent in the sense that, given a partial knowledge of the system, one can predict the remainder.” - Frederick P. Brooks, Jr., The Design of Design: Essays from a Computer Scientist, 2010.

In other words, consistency is paramount. The easiest way to get consistency is to just use HTTP without adornment or invention. Where HTTP does not provide answers (this is less common than many people think), try to pick one solution and stick with it.

Is there any concept of statuses in RESTful APIs (viz. draft, dev, test, released, obsolete)? How do you implement this lifecycle of statuses? Is there any documentation on this?

HTTP does not address this—HTTP would view this as part of the modelling of your problem domain, and therefore out of scope of HTTP itself. One piece of guidance would be "don't put the state (or status) of a document into its URL, because that will change".

See this article for more.

What are your thoughts on using an API gateway as an internal enterprise integration hub/gateway?

We have many customers using Apigee Edge as both internal and external hubs/gateways. This is also an investment area for us. If this is an important topic for you, you should ask for a presentation/briefing focused on the topic.

You said that you implemented something similar to GraphQL? Can you share what made you implement that?

The API had a set of well-known URLs of the form /widgets, /thingummies, /doohickeys. We wanted to offer URLs of the form /widgets?{query}, /thingummies?{query} and /doohickeys?{query} and we also needed to offer /query?{query} for queries that "join" across resource types. We had two needs: define a syntax for queries and provide an implementation. We looked at GraphQL, but we were nervous of a design that runs a complex query engine in application space and relies on primitive APIs for raw data access. We have no objective evidence that GraphQL would have been problematic, but the idea made us nervous.

We designed our system so that it stores a copy of all the data in a set of read-only tables in a standard database system with replication and scale-out. This allows us to push the queries down to our database rather than implementing them in a system like GraphQL. This option will not be open to everyone, because you can't always get all the data into a database. If you can, it enables the queries to execute on a standard database query engine and, more importantly perhaps, execute very close to where the data is stored.

We designed a fairly simple query language that happens to be conceptually similar to GraphQL's (although its design predates our exposure to GraphQL) and a simple processor that translates these queries into the query language of our database. Our language is not as rich as GraphQL but has proven very effective. The whole thing is very simple, and has worked well with good performance. Creating the right indexes on the database table(s) can take a little thought, but our experience is that a few well-chosen indexes enable decent performance on a wide range of queries.

Perhaps the biggest lesson we learned is that having a good query capability (in addition to standard HTTP CRUD, of course) is very powerful—people have done all sorts of interesting things on top of our API without ever having to talk to us or request new features. This has also helped avoid the need for "experience APIs" (see response above) layered on top of our API.

If versions are not part of links, how do we make links to new resources existing in v2 but not v1? Or how do we link between multiple APIs—is this making an assumption that the versioning is done at the header level?

Links are always written using URLs of resources that do not contain version numbers. If clients want to request a specific version of a resource, there are two choices. The first is to allow clients to provide an Accept-Version header in their requests. The second is to provide clients with a rule for transforming the URL of a resource into the URL of one of its versions.

Since we can't get people to agree which one they want, we just implement both in the APIs I work on. In my experience, the energy required to implement both is much less than the energy required to argue about it. They both make sense in the HTTP model. I personally prefer the first approach (header), because it doesn't require clients to learn a "rule" that is specific to our application for transforming URLs. The argument for a header would be stronger if someone would standardize Accept-Version so it could be referenced by all applications.

You mentioned the risks of coupling an API entity model to its domain or data model. Can you talk more about those risks and suggest some ways to mitigate them?

The model you expose through an API is the conceptual model of the problem domain as seen by a client. The actual storage model is often more complex than the conceptual model, but it doesn't always have to be. Decoupling them is useful because it allows you to keep the conceptual model simple even if performance or other concerns force compromises in the storage model.

Do you have advice on the use of patterns like idempotency and upsert vs. discrete CRUD?

If you are writing a microservices application rather than a monolith, you will probably face the problem where a single conceptual create, update, or delete requires changes to state stored in more than one microservice. I have not found a very simple way of doing this reliably. My current approach relies heavily on idempotency, but also requires a sort of application-level two-phase commit strategy. It works, but it's not simple.

HTTP has standard support for upsert—it is called PUT. I used to rely on POST for create, and either ignore PUT completely in favor of PATCH, or (shame on me) implemented only half the function of PUT (update but not create). Recently I have been working on some APIs where there is a requirement that clients be able to synchronize the state of the API from a set of external files. This has given me a new understanding of the value of PUT.

What factors should be used on breaking the resources into APIs? Does 100 resources mean 1, 2, 5, or 100 APIs?

As I said above, the word API is a slippery word. In HTTP there are only resources. A simple strategy is to have one URL for each type (e.g. /dogs, /people) plus one URL for each instance (the format of these URLs does not have to be specified unless you allow create via PUT). How many APIs is that? 1? 2? 102? Somewhere in between? I'll let you decide.

For more on API Design, read the eBook "Web API Design: The Missing Link."


Best Practices for Building Secure APIs

API designers and developers generally understand the importance of adhering to design principles while implementing an interface. No one wants to design or implement a bad API!

Even so, it’s sometimes tempting to look for shortcuts to reach those aggressive sprint timelines, get to the finish line, and deploy an API. These shortcuts may pose a serious risk — unsecured APIs.

Developers should remember to wear the hat of an API hacker before deploying. If a developer neglects to identify the vulnerabilities in an API, the API could become an open gateway for malicious activity.

Identifying and solving API vulnerabilities

An API can work for or against its provider depending on how well the provider has understood and implemented its API users’ requirements. If a company builds an incredibly secure API, it might end up very hard to use. A fine balance needs to be struck between the purpose of an API and ease of consumption. In this post, we’ll explore some of the API vulnerabilities we’ve come across through our work as part of Google’s Apigee team, including how these vulnerabilities might have been prevented.


APIs are the gateways for enterprises to digitally connect with the world. Unfortunately, there are malicious users who aim to gain access to enterprises’ backend systems by injecting unintended commands or expressions to drop, delete, update, and even create arbitrary data available to APIs.

In October 2014, for example, Drupal announced a SQL injection vulnerability that granted attackers access to databases, code, and file directories. The attack was so severe that attackers may have copied all data out of clients’ sites. There are many types of injection threats, but the most common are SQL Injection, RegEx Injection, and XML Injection. More than once, we have seen APIs go live without threat protection — it’s not uncommon.

APIs without authentication

An API built without protection from malicious threats through authentication represents an API design failure that can threaten an organization’s databases. Ignoring proper authentication — even if transport layer encryption (TLS) is used — can cause problems. With a valid mobile number in an API request, for instance, any person could get personal email addresses and device identification data. Industry-standard strong authentication and authorization mechanisms like OAuth/OpenID Connect, in conjunction with TLS, are therefore critical.

Sensitive data in the open

Normally, operations teams and other internal teams have access to trace tools for debugging issues, which may provide a clear view of API payload information. Ideally, PCI cardholder data (CHD) and Personal Health data (PHI) is encrypted from the point where data is captured all the way to where data is consumed, though this is not always the case.

With growing concerns about API security , encryption of sensitive and confidential data needs to be a top priority. For example, in June 2016, an http proxy vulnerability was disclosed that provided multiple ways for attackers to proxy the outgoing request to a server of choice, capture sensitive information from the request, and gain intelligence about internal data. Beyond using TLS, it’s important for API traffic to be protected by encrypting sensitive data, implementing data masking for trace/logging, and using tokenization for card information.

Replay attacks

A major potential concern for enterprise architects is the so-called “transaction replay.” APIs that are open to the public face the challenge of figuring out whether to trust incoming requests. In many cases, even if an untrusted request is made and denied, the API may politely allow the — potentially malicious — user to try and try again.

Attackers leverage this misplaced trust by attempting to playback or replay a legitimate user request (in some cases using brute force techniques) until they are successful. In 2016, hackers got into Github accounts via a playback attack by reusing email addresses and passwords from other online services that had been compromised and trying them on Github accounts.

Countermeasures include rate-limiting policies to throttle requests, the use of sophisticated tools like Apigee Sense to analyze API request traffic, and identification of patterns that represent unwanted bot requests. Additional security measures to stymie replay attacks include:

  • HMAC, which incorporates timestamps to limit the validity of the transaction to a defined time period
  • two-factor authentication
  • enabling a short-lived access token by using OAuth

Unexpected surges in API usage

It’s always tricky to estimate the usage of an API. A good example is the app that briefly brought down the National Weather Service API. This particular API didn’t have any kind of traffic surge prevention or throttling mechanism, so the unexpected surge in traffic directly hit the backend.

A good practice is to enforce an arrest in spike traffic or a per-app usage quota, so that the backend won’t be impacted. This can be easily rolled out with the help of a sophisticated API management platform with policies like quota and spike arrest.

Keys in URI

For some use cases, implementing API keys for authentication and authorization is good enough. However, sending the key as part of the Uniform Resource Identifier (URI) can lead to the key being compromised. As explained in IETF RFC 6819, because URI details can appear in browser or system logs, another user might be able to view the URIs from the browser history, which makes API keys, passwords, and sensitive date in API URIs easily accessible.

It’s safer to send API keys is in the message authorization header, which is not logged by network elements. As a rule of thumb, the use of the HTTP POST method with payload carrying sensitive information is recommended.

Stack trace

Many API developers become comfortable using 200 for all success requests, 404 for all failures, 500 for some internal server errors, and, in some extreme cases, 200 with a failure message in the body, on top of a detailed stack trace. A stack trace can potentially become an information leak to a malicious user when it reveals underlying design or architecture implementations in the form of package names, class names, framework names, versions, server names, and SQL queries.

Attackers can exploit this information by submitting crafted URL requests, as explained in this Cisco example. It’s a good practice to return a “balanced” error object, with the right HTTP status code, with minimum required error message(s) and “no stack trace” during error conditions. This will improve error handling and protect API implementation details from an attacker. The API gateway can be used to transform backend error messages into standardized messages so that all error messages look similar; this also eliminates exposing the backend code structure.

Keep APIs safe

As we have reviewed in this article, many potential threats can be avoided by putting some thought into API design and establishing governance policies that can be applied across the enterprise. It is important to guard APIs against malicious message content by accessing and masking sensitive encrypted data at runtime and protecting backend services against direct access. An API security mistake can have significant consequences — but with the right forethought and management, businesses can make themselves much safer.

This post originally appeared in Medium.

API Design: Choosing Between Names and Identifiers in URLs

If you're involved in the design of web APIs, you know there's disagreement over the style of URL to use in your APIs, and that the style you choose has profound implications for an API’s usability and longevity. The Apigee team here at Google Cloud has given a lot of thought to API design, working both internally and with customers, and I want to share with you the URL design patterns we're using in our most recent designs, and why.

When you look at prominent web APIs, you'll see a number of different URL patterns.

Here are two API URLs that exemplify two divergent schools of thought on URL style:


The first is an anonymized and simplified version of a real URL from a U.S. bank where I have a checking account. The second is adapted from a pedagogic example in the Google Cloud Platform API Design Guide.

The first URL is rather opaque. You can probably guess that it’s the URL of a bank account, but not much more. Unless you're unusually skilled at memorizing hexadecimal strings, you can’t easily type this URL—most people will rely on copy and paste or clicking on links to use this URL. If your hexadecimal skills are as limited as mine, you can’t tell at a glance whether two URLs like these are the same or different, or easily locate multiple occurrences of the same URL in a log file.

The second URL is much more transparent. It’s easy to memorize, type and compare with other URLs. It tells a little story: there's a book that has a name that's located on a shelf that also has a name. This URL can be easily translated into a natural-language sentence.

Which should you use? At first glance, it may seem obvious that URL #2 is preferable, but the truth is more nuanced.

Read the whole story on the Google Cloud Platform blog.

Solving SEO Problems with API Design, Pt. 2

In a previous post, we discussed SEO problems faced by single-page applications, or SPAs, and outlined a basic approach to make SPAs indexable by search engines. Here, we’ll discuss how API design fits into the picture.

Consider the implementation of the SPA. Imagine the SPA has already displayed the user interface for the fictitious API resource I’ll call Mickey Mouse, and the user has clicked on the link for the resource I’ll call Minnie Mouse. The SPA must perform the following steps:

  • Fetch the data for Minnie; let's say it's at https://api.acme.com/entity/Minnie
  • Add a URL for Minnie (say, https://ui.acme.com/entity/Minnie) to the browser history
  • Construct the user interface document object model (DOM) for Minnie in the browser (or fill new data into the existing one)

Having two different URLs for Minnie—https://ui.acme.com/entity/Minnie and https://api.acme.com/entity/Minnie—and selecting which to use depending on whether you want HTML (for browsers) or JSON (for programmatic clients) is very common, but has downsides.

One is that you must have a rule for converting from one to the other, which has to be learned and separately documented. Also, if I want to email a link to Minnie to you, which one should I send? Or if I want to link to Minnie from another resource, which URL should I include?

Converging on a single URL

The answer is: "it depends on what the recipient wants to do with the link.” You’ll probably also have to implement two servers to support these URLs.

An alternative to this approach is to define a single URL for Minnie and use content negotiation to decide whether to return HTML or JSON. This means that each time a client (regardless of whether it’s a browser or a programmatic client) uses the unique URL for Minnie, it includes a header in the HTTP request to say whether, on this occasion, it wants HTML or JSON.

The browser itself will ask for HTTP, while the JavaScript of a SPA running in the same browser will ask for JSON, both using the same URL. The Ruby-on-Rails framework popularized the use of content negotiation, but it is a core feature of the HTTP protocol that can be used in any programming language.

I like to use this approach because it defines a single URL for each entity that can be used for either human web browsing or programmatic API access. I appreciate the fact that it makes my API immediately browsable using a standard web browser without any special plugins, tools, or API metadata descriptions—my SPA is also my API browser.

I can even browse the API with JavaScript turned off to see a more basic rendering of the API data in HTML, and I’ll see a direct rendering of the same HTML that search bots will see.

Note: You can also use this approach to support a degraded UI for browser users that cannot or will not turn on Javascript in their browsers. Those users see a rendering of the same HTML a search bot sees.

Creating HTML from programming objects

This approach is easy to implement, because the HTML can be created algorithmically from the same programming objects that are normally used to produce JSON, without requiring coding or knowledge specific to the API. This is especially easy in the case described above because the HTML is not expected to produce a user interface, just a meaningful description of the entity for non-browser clients.

Producing this sort of HTML is not quite as simple as serializing objects in JSON—for which all the popular programming languages have built-in libraries—but it isn’t hard. At the end of this post, there's an appendix with a few lines of JavaScript code that you could include in a Node.js server for this purpose. Note that you don't ever have to accept HTML upon input in your API—it's sufficient to produce it on output.

Creating HTML from programming objects is implemented much more simply if the data in your objects includes links, even for the JSON case. Including links in JSON is something you should do in your APIs anyway, but that topic is the subject of other blog posts and covered in the eBook, “Web API Design: The Missing Link.”

Dealing with SPA load speed

In the discussion above, we considered the case where the SPA was already loaded and the user was navigating from Mickey to Minnie. When the SPA was first loaded, the sequence of events went something like this:

  • The browser loaded the HTML document at https://acme.com/entity/Mickey
  • The JavaScript file at https://acme.com/mySPA.js was loaded and executed
  • The DOM created from the HTML body was discarded or simply ignored; the JavaScript code saw that the URL that was loaded was the URL for Mickey, went back to the server to ask for the data for Mickey in JSON format, and then constructed a new browser DOM that is displayed to the user

This illustrates one of the downsides of SPAs: they typically load more slowly than "old-fashioned" HTML, in part because they load and execute lots of JavaScript, and in part because they have to go back to the server again for data.

Fortunately, they also load much less often. In practice, the first reason is usually more significant than the second, but optimizing the load times of your JavaScript is outside of the scope of this post.

The second reason for SPA slowness can be entirely eliminated if the SPA is willing to read the resource data from the HTML DOM it already has, rather than going back to the server for a JSON version. For this to work, we have to enhance the HTML body slightly as follows:

<!DOCTYPE html>



   <meta name="description" content="Personal information about Mickey Mouse" />

   <meta name="keywords" content="Mickey Mouse Donald Duck Minnie Goofy Disney" />

   <script src="/mySPA.js"></script>



   <div style="display: none;" resource="https://acme.com/entity/Mickey">

     <p>name: <span property="name" datatype="string">Mickey Mouse</span></p>

     <p>girlfriend: <a property="girlfriend" href="https://acme.com/entity/Minnie">Minnie Mouse</a></p>


       <ol property="friends">

         <li><a href="https://acme.com/entity/Donald">Donald Duck</a></li>

         <li><a href="https://acme.com/entity/Goofy">Goofy</a></li>






All I did was add the property names and types using the standard "property" and "datatype" attributes. For completeness, I also used the "resource" attribute to include the URL of the entity whose HTML this is—this is usually necessary for nested objects, but is a good idea even at the outer level.

On first load, my SPA can now read the data for Mickey directly from the DOM that has already been created by the browser from the HTML body, instead of going back to the server. This saves a server round-trip.

The JavaScript code to read this DOM is simple and general. I have not included source code for this, but it is similar in size and complexity to the code shown below for generating the same HTML. It essentially performs the inverse of the code to generate HTML from programming objects (happily, the HTML has already been parsed into a DOM by the browser).

Semantic search

Parsing our own HTML in the SPA to save a server round-trip is not the only motivation for adding these attributes. Google search also understands them. In addition to reducing the load times for our SPA, we are providing richer data to the Google search engine. See RDFa, schema.org, or microdata for more information on this.

The attribute names I used in the example are from RDFa, although my use of them is not ideal, because my property names do not define useful URLs. This isn't hard to correct, but the detail of doing so would have been distracting. You can use schema.org or microdata attributes instead if you prefer, but you should stick to something that Google understands rather than inventing your own.

It's quite straightforward to write a SPA that is compatible with search engine optimization. All you have to do is to use “regular” URLs instead of URI references with fragment identifiers (for example, those that include the “#” character) and make sure each of these URLs returns a simple HTML format that will be useful for search engines but is never seen in the user interface.

You can improve the quality of the data that you provide to Google search and reduce the load times of your SPA at the same time if you include microdata-style attributes in the HTML you produce on the server for search bots, using one of several standards that Google supports.

And finally, you can simplify your overall implementation and improve your API if you unify your API with your UI by using the same URLs for both.

Appendix: Generating HTML from JavaScript objects

The following code assumes that the "body" parameter is a JavaScript object that you would normally serialize to JSON to produce the API representation of a resource. In other words, instead of writing JSON.stringify(anObject) to create the API response, you would write toHTML(anObject)to produce an HTML response.

function toHTML(body) {

 const increment = 25;

 function valueToHTML(value, indent, name) {

   if (typeof value == 'string') {

     if (value.startsWith('http') || value.startsWith('./') || value.startsWith('/')) {

       return `<a href="${value}"${name === undefined ? '': ` property="${name}"`}>${value}</a>`;

     } else {

       return `<span${name === undefined ? '': ` property="${name}"`} datatype="string">${value}</span>`;


   } else if (typeof value == 'number') {

     return `<span${name === undefined ? '': ` property="${name}"`} datatype="number">${value.toString()}</span>`;

   } else if (typeof value == 'boolean') {

     return `<span${name === undefined ? '': ` property="${name}"`} datatype="boolean">${value.toString()}</span>`;

   } else if (Array.isArray(value)) {

     var rslt = value.map(x => `<li>${valueToHTML(x, indent)}</li>`);

     return `<ol${name === undefined ? '': ` property="${name}"`}>${rslt.join('')}</ol>`;

   } else if (typeof value == 'object') {

     var rslt = Object.keys(value).map(name => propToHTML(name, value[name], indent+increment));

     return `<div${value._self === undefined ? '' : ` resource=${value._self}`} style="padding-left:${indent+increment}px">${rslt.join('')}</div>`;



 function propToHTML(name, value, indent) {

   return `<p>${name}: ${valueToHTML(value, indent, name)}</p>`;


 return `<!DOCTYPE html><html><head></head><body>${valueToHTML(body, -increment)}</body></html>`;


For more on API design, check out the free eBook, "Web API Design: The Missing Link."
Image: The Noun Project/Kevin Augustine LO

Solving SEO Problems with API Design

Single-page applications are popular and easy to work with, but often make information hard to find

Last year I visited the software development arm of household-name retailer that was attempting to rebuild the user interface for their main website using a single-page application (SPA) implementation design. The project had failed.

It did so for for two main reasons: page load times were unacceptably long, and search engines were unable to effectively index the new site. 

At the time of my visit, the retailer had abandoned the SPA design approach in favor of "old-fashioned" HTML construction on the server. This is a pity, because a properly-designed and -implemented SPA could have provided a superior experience for the company’s users, and superior productivity for its developers.

There’s a lot of advice on the web on how to optimize load times for single-page applications, but less on how to deal with the problem of search-engine indexing. This post explains how the search engine indexing problem can be solved through thoughtful design of APIs—perhaps not the place many people might look for a solution to a search engine problem.

SPAs enable end users to navigate between different entities without performing an HTML document load for each entity. 

Note: HTML document loading is a concept fundamental to all web browsers—it’s precisely defined here and in other specifications. "Single-document application" would have been a better name than single-page application, if you value consistency with the terminology in the specifications.

Essentially, a JavaScript program is loaded into the browser as a result of an initial HTML document load, and that JavaScript program then uses a series of API calls to get the data and present a user interface for a succession of entities without having to load an HTML document (and corresponding JavaScript) for each one. 

The SPA experience

There are many reasons for the popularity of SPAs: they are easy and fun to write; they can provide a superior user experience; and they help provide a clean separation between user interface code and business logic.

Another important advantage of SPAs is that their overall design is similar to that of mobile apps that display the same data and access the same function—in fact many people use the same HTML5 SPA implementation for both web and mobile.

Early SPAs often failed to integrate well with the browser. One of the most visible mistakes was a failure to update the browser history appropriately. As the user navigated between resources, the SPA failed to update the address bar and the browser history, resulting in providing nothing for the user to bookmark and the back, forward, and history controls of the browser not working.

Better understanding of how a SPA should be written along with adoption of frameworks like Angular that help developers write good SPAs have resulted in more and more SPAs that integrate well with the browser. Yet there are still few SPAs that also work well with search engine optimization.

HTML5 improvements aren’t enough

If you look in the web browser address bar of a typical SPA, you will see addresses that look like this:



Note: technically these addresses are called URI references, not URLs—the URL terminates at the # character.

The only HTML document that exists in a design like this is the document at the URL https://acme.com/. This is the document that will load a JavaScript application that will then make API calls to retrieve JSON representations of entities with identifiers of the form /entity/xxxx. The Javascript then creates a browser DOM whose rendering is what the user sees.

There is only one HTML document for search engines to retrieve—https://acme.com/—and it usually contains only code or references to code, which is not useful to a search bot. This completely defeats search engine indexing.

With the ubiquity of the HTML5 history API in browsers, SPAs can now be written to use URLs like this one instead: 



This is an important improvement, because there is now a separate URL that is retrievable from the server for each entity that is exposed by the SPA. Although this is a step in the right direction, this does not by itself solve our SEO problem, because the resources at these URLs—like the single resource we had previously at https://acme.com/—typically only contain code or references to code, as illustrated in the following example. This is still useless to a search bot.

<!DOCTYPE html>



   <script src="/mySPA.js"></script>






Assuming the JavaScript is written appropriately, this HTML will display the correct user interface for each one of the entities of the app, whose URLs might look like https://acme.com/entity/xxxx. The JavaScript code will construct a browser document object model (DOM) for the user interface even though there is no content in the body of the HTML document to do this. The JavaScript code must also look at the URL of the document that was loaded to determine the initial entity to present to the user.

Add some meta

Consider the example of an application that displays information about Disney cartoon characters. In order to make the HTML useful for a search bot, we can simply add additional information, like this:

<!DOCTYPE html>



   <meta name="description" content="Personal information about Mickey Mouse" />

   <meta name="keywords" content="Mickey Mouse Donald Duck Minnie Goofy Disney" />

   <script src="/mySPA.js"></script>



   <div style="display: none;">

     <p>name: <span>Mickey Mouse</span></p>

     <p>girlfriend: <a href="https://acme.com/entity/Minnie">Minnie Mouse</a></p>



         <li><a href="https://acme.com/entity/Donald">Donald Duck</a></li>

         <li><a href="https://acme.com/entity/Goofy">Goofy</a></li>








All we did was add some <meta> elements to the head of the document (search engines take note of meta elements) and a simple <body> that will never be displayed to the user.

The essential idea—which might not be immediately obvious—is that the <body> element we included here has no influence on the user interface that a user will see—producing that user interface is the job of the JavaScript of the SPA. 

The goal of the <body> element is only to provide useful information for search bots, and other non-browser clients that understand HTML. Users will see only the user interface created by the JavaScript of the SPA, and search engines will see only the body shown above.

Obviously, this is not a sophisticated example of what you want to include in your HTML for SEO—this is not an SEO tutorial, which I'm anyway not qualified to write—but it shows how you can include HTML that is visible to search engines for all the entities that are displayed in your SPAs.

In summary, there are two steps required to solve the SEO problem for SPAs. The first is defining and using separate URLs—ones without fragment identifiers introduced by the "#" character—for each entity shown in the user interface.

The second step is providing meaningful HTML bodies for each entity, even though that HTML will not be seen by human users of the SPA.

This post has outlined a basic approach to make SPAs indexable by search engines, but we have not yet linked the story to API design. We'll do that in an upcoming post.

For more on API design, check out the free eBook, "Web API Design: The Missing Link."

Image: Flickr Creative Commons / ECP

The False Dichotomy of URL Design in Web APIs

There are two schools of thought on how to design URLs in web APIs, and they often turn into warring factions. Here at Apigee, in the past, we have seen very heated discussions on this topic, with the ultimate resolution often being made arbitrarily by whomever controlled the particular API under discussion. 

Recently, we’ve arrived at a common view on the topic, which has restored team harmony and—we believe—resulted in better APIs. We thought we'd share our secret.

URLs for stability

One of the historic schools of thought (typically the minority one) was populated by people who were steeped in the web. In that tradition, URLs are stable identifiers of web resources. The format of the URL is controlled by the server and should be opaque to clients. In other words, the URL format of a resource is not part of the API. 

Because there’s a strong requirement for stability over time, the guidance is to try not to encode any information in the URL. For a library inventory system, URLs in this style might look like this: 


There is a very famous document entitled "Cool URIs don't change" by Tim Berners-Lee, the inventor of the web, explaining the rationale for URLs like this. It contains this quote:

"After the creation date, putting any information in the name is asking for trouble one way or another."

Even putting the word “book” in the URL above is questionable—if we switch to digital media, is it still a book or is it now a CD?

URLs for humans

URLs designed for stability like this one are sometimes called permalinks. They might appeal to your inner engineer: they are like well-designed primary key values at the scale of the world wide web.

However, they aren’t friendly to humans, and the majority of API designers has decided that being friendly to humans is more important than web theory. So they’ve invented URLs for books that look like this:


URLs like this work well because they align with two very powerful techniques that humans use to think and talk about things: we give them names and we organize them in hierarchies. These techniques are older than recorded history.

URLs like this are everywhere in API design, and they are effective. However, systems based on them typically face the problem that renaming entities and reorganizing hierarchies is difficult or impossible. 

Apigee—and more broadly Google (and most other prominent web companies)—offers products today that display these limitations. Unfortunately, it often turns out that the ability to rename entities and reorganize hierarchies is more important and more frequently needed than the product designers initially envisioned.

A unified approach to URL design

After a fairly extended period of debate and polarization within Apigee, we came to understand that picking between these two URL design approaches is a false choice. It is not only reasonable to do both, but for a high-quality API it is usually necessary. We also discovered that there is a very simple idea that neatly unifies the two.

The insight that helped us unify these approaches is that the URL https://libraries.gov/library/sunnyvale/shelf/american-classics/book/moby-dick isn’t actually the URL of a book—it's the URL of a query or, more precisely, a query result.

The meaning of the query is "that book whose title is [currently] 'Moby Dick' that is [currently] on the shelf named 'American Classics' at the Sunnyvale City Library." The same query could have been expressed using a URL that included a query string. The difference is one of URL style, not meaning.

Queries are very useful for locating things, but they have the characteristic that they are not guaranteed to return the same thing twice. The URL above may today return the book whose URI is:


But tomorrow it might not return anything, or it may return a different thing altogether.

Recognizing that these two URLs are actually the URLs of two different things—a book and a query result—rather than two different URLs for the same thing made it obvious that the right solution was to implement both. 

Most of Apigee's more recent APIs do exactly this. This enables us to simultaneously have cool, stable permalink URLs for identifying entities, and human-friendly query URLs for finding them based on information that humans typically know. This in turn enables us to rename entities and reorganize hierarchies without breakage.

API clients have to be thoughtful about which URLs to use. For example, if a client wants to store a persistent reference to an entity, then that client should use the permalink URL of the entity, not the query URL that today happens to return the same result.

This is very important for internal services, like a permissions service that stores access control rules for entities, as well as for external clients. Following this guidance throughout a system is necessary to ensure that entities can be renamed and hierarchies reorganized without breakage.

Connecting queries and entities is also important. Whenever a query URL is used in an HTTP request, our newer APIs always return the permalink URL in the Content-Location header of the response (and in the response body too) to ensure that clients always have access to the stable permalink URL when they need it.

Happy teams and better APIs 

That’s the story of how we restored peace and harmony to API design teams in Apigee. Part of the reason this worked well is that neither of the original schools of thought had to admit that they had been wrong—they just had to acknowledge that their view had been incomplete, and that they had something to learn from the other school.

We think that the result is not only happier teams, but better APIs.

For more on URL design and a host of other best practices, download the eBook, “Web API Design: The Missing Link.

Image: Noun Project / Roselin Christina.S

Common Misconceptions about API Versioning

When versioning makes sense—and when it doesn’t

API versioning is often misunderstood, in part because the term is used to describe more than one basic concept. One of the misconceptions about versioning is that it’s something you need to bake into your APIs from the start. Consider the following examples.

Suppose you’ve written an API that enables people to create and access data for an application. Following advice that you read on the internet, you prefixed all the HTTP paths of your API with /v1. All your URLs look like http://acme.com/v1/thingumy/12345

A couple of years later, you do a major rewrite of your API. The URLs of the new API all look like http://acme.com/v2/thingumy/12345 (note the /v2 in place of /v1). This second-generation API has much more advanced features. To accommodate those features, the API data model and the underlying database are richer and more complex.

As a result, data created with V1 is not visible in V2 and vice versa. You thought about trying to implement V2 in a way that was backwards-compatible with V1, but it proved impossible to provide the new function and concepts in a way that still allowed V1 clients to update the data.

That's probably okay—many clients will be fine with keeping old data in the old system and new data in the new system, and you could also offer a data migration API to upgrade from V1 to V2.

Are those APIs really versions?

The story above describes a classic API-versioning scenario, and proves the value of the conventional guidance on versioning in APIs, right? Not really.

Your V1 and V2 APIs are actually two independent APIs with no relationship to one another. To you, they’re related, because they address the same problem domain, and you wrote one of them after you wrote the other. But those relationships are only in your head—these APIs are not tied together in any concrete way. 

You could as easily have made the URLs of the second API look like http://v2.acme.com/thingumy/12345 or http://not-acme.com/thingumy/12345. These are two independent APIs that share some family lineage or came from a common foundry. 

The concept of “lineage of independent APIs” one of the ideas that people associate with the word “versioning.”

The power of “V”

Let's look at another example. Suppose you had a URL like this in your API: http://acme.com/v1/editor-options. A GET on this resource returns this: 

[{"name": "tabWidth",

 "value": "4"},

{"name": "defaultFontSize",

 "value": "10"},

{"name": "defaultFont",

 "value": "Arial"},

{"name": "backgroundColor",

 "value": "White"}


You get complaints from your users that this is inconvenient—they have to iterate through the whole array to find the edit option they are looking for. Versioning to the rescue! We can simply introduce a V2, like this:

 http://acme.com/v2/editor-options (note again the /v2 in place of /v1):


{"tabWidth": "4",

"defaultFontSize": "10",

"defaultFont": "Arial",

"backgroundColor": "White"



We can also enable the information to be updated via either format through the appropriate URL.

This is a completely different meaning for the word “versioning.” In this case http://acme.com/v1/thingumy/12345 and http://acme.com/v2/thingumy/12345 are not independent resources in independent APIs—they each read and write the same data.

To add version identifiers, or not to?

Does this illustrate the wisdom of the conventional advice to put a version identifier in URLs? Not really. Suppose I have another resource whose URL is http://acme.com/v1/preferences.

It looks like this:

{"networkOptions": "http://acme.com/v1/network-options",

"dataOptions": "http://acme.com/v1/date-options",

"editorOptions": "http://acme.com/??/editor-options"}


How is the server to decide whether to put /v1 or /v2 on the last line?

One option: if the user asked for /v1 of preferences, she will get links to /v1 of all the other resources. This is not necessarily what the user wants. And what happens if the user asks for V2 of preferences, but there is no corresponding V2 of the networkOptions?

Another option is to return simply http://acme.com/editor-options (no version identifier) and let the client construct a suitable URL by parsing the URL and inserting /v1 or /v2. A variant of this idea is to return a URI template instead of a URL, like this: http://acme.com/{version}/editor-options.

Cleaning up with content negotiation

This is looking a bit complex both in practice and conceptually—try to write a convincing paragraph or two to describe the two different resources identified by the URLs http://acme.com/v1/editor-options and http://acme.com/v2/editor-options and their relationship. 

By introducing a second URL, we are introducing a second entity into our model, and the rest of our problems are a consequence of juggling two entities with overlapping states. Our original intent was not to introduce a new API entity—that was a side-effect of our solution. The intent was simply to give users a choice in how the data of the original entity should be formatted.

HTTP offers a cleaner solution to the problem of offering users multiple formats for the same resource. It’s called content negotiation. You’re probably already familiar with the idea: when making an HTTP request, the client includes a number of headers that describe what format (media type in the jargon) they want the information back in, or what format they are providing data in.

The two most commonly used headers for this are Accept to specify the desired format in the response and Content-Type for the provided format in the request. Accept-Language is also commonly used by browsers requesting HTML, and less commonly by API clients.

There are two obvious ways we can use content negotiation to solve our problem without introducing new URLs and their consequent headaches. In the example above, the v2 format uses simple JSON name-value pairs. It would be fair to consider application/json as being the correct media-type for V2.

The original V1 format isn't simply JSON; it defines its own peculiar grammar on top of JSON. V1 is JSON in the same sense that JSON itself is text (it’s JSON, but it's not just JSON): it has a special format of its own. 

If we had been purists, we might have invented our own media type for V1—something like application/convoluted+json—but it’s unlikely that we did this. At this point, our choices are to invent a new media type for the new format to use in the standard accept and content-type headers (for example, application/just+json) or use a different header entirely.

A popular though not standardized header for this purpose is Accept-Version. Requests that include the header Accept-Version: V2 will get the V2 format. Requests that omit the header or use it to ask for V1 will get the V1 format.

If we use content-negotiation to communicate the format information, our preferences example becomes simple: 

{"networkOptions": "http://acme.com/network-options",

"dataOptions": "http://acme.com/date-options",

"editorOptions": "http://acme.com/editor-options"}


Removing version identifiers from the URLs solves the problem.

Why doesn't everyone agree that content-negotiation is a better solution?

Many people decide to use version identifiers in URLs instead of headers because of the convenience of using URLs without headers, especially in the browser. (If you are using curl to access an API that uses content-negotiation headers, you will have to add -H "Accept-Version: V2" to the command, which isn’t too onerous). 

In the browser, you’ll have to use a plugin like Postman to set the header, which is a bit more of a burden. Despite this, I think it is short-sighted to use version identifiers in URLs—in the end the price you pay for creating a more complex conceptual model will be higher.

Ignorance can be bliss

You’ll often see advice on the internet saying that not only should you put version identifiers in URLs, but you should do it right from the beginning, to allow for future evolution. Regardless of the strategy you use for versioning, we think it works perfectly well to ignore versioning initially and only add it if and when you need it.

This offers a significant advantage: if it turns out that you don't really need versioning—which has been the case for most of our own APIs—then you didn't add unnecessary complexity to the initial API.

There’s an old story of a farmer who goes to great lengths to avoid mowing over the fairy rings that grow spontaneously in the fields of his farm. When asked why he does this, he replies, "because I'd be a damned fool if I didn't."

If he were a developer, the farmer would probably also put version identifiers in of all his APIs, right from the start.

For more on versioning and a host of other best practices, download the eBook, “Web API Design: The Missing Link.


Why Your Web APIs Should Be Entity-Oriented

The dominant model for APIs in distributed computing for decades has been Remote Procedure Call (RPC). This isn't surprising—ever since Fortran II introduced functions in 1958, the function  (or procedure) has been the primary organizing construct that programmers use to write code.

Most distributed APIs are defined by programmers, and the simplest way for them to think about a distributed API is that it allows some of the procedures of their program to be invoked from outside the program. Historically, systems like DCE and CORBA provided system software and tools to help implement RPC.

When people started implementing APIs on the world-wide web, they naturally carried over the familiar RPC concepts from previous environments. This led initially to standards like WSDL and the so-called WS-* standards, which were heavyweight and complex. Most web APIs now use HTTP in a much more lightweight way—often called RESTful—that retains concepts from the RPC model while blending in some of the native concepts of HTTP/REST.

HTTP itself is purely entity-oriented, not procedural—it defines a small number of standard “methods” for manipulating entities, but otherwise does not model procedures. A minority of web API designers have abandoned the traditional RPC model completely and design web APIs that are based entirely on HTTP's entity-oriented model; they are using HTTP simply and directly, without layering RPC concepts on top of it. In a moment, I'll explain why they are doing this.

At this point, you might be confused, because much of the available information on the web would lead you to think that the current crop of popular web APIs follows the entity-oriented model called REST. For example, the introduction to the OpenAPI Specification (formerly known as Swagger) says "The goal of the OpenAPI Specification is to define a standard, language-agnostic interface to REST APIs."

In fact, OpenAPI is a fairly traditional RPC Interface Definition Language (IDL) and describing an entity-oriented API with OpenAPI is awkward and imprecise. The fact that OpenAPI can fairly easily and accurately describe the majority of the APIs currently on the web is a reliable indication of their nature. There have been some attempts to define IDLs for entity-oriented APIs—Rapier is an example. One way to understand the challenges of representing an entity-oriented API in OpenAPI is to look at the output of Rapier's OpenAPI generator.

Why entity-oriented APIs matter

So why would you care about this? Why are some people interested in entity-oriented rather than procedure-oriented APIs? Imagine an API for the classic students and classes problem. In a procedural API, I might need the following procedures:

  • add a student record
  • retrieve a student record
  • add a class record
  • add a student to a class
  • list classes a student is enrolled in
  • list students enrolled in a class
  • assign an instructor to a class
  • transfer a student between classes

In practice there would be dozens of these procedures, even for a simple problem domain like this one. If I looked at the API for another problem domain, I would start over again—nothing I learned about students and classes will help me learn the next API, whose procedures will all be different and specialized to its own problem domain.

What’s wrong with that?

Most programmers are not surprised or dismayed by the proliferation of APIs with little commonality between them—learning all this detail and diversity is simply part of the life of a programmer. However, even programmers have a different expectation when they program to a database. Database management systems (DBMSs) offer a standard API for interacting with data regardless of the details of the data itself.

If you are programming to MySQL, PostgreSQL, or any other DBMS, you have to learn the schema of the data you are accessing. But once you have done that, all the mechanisms for accessing that data—the API—are standardized by the DBMS itself. This means that when you have to program to a different database, the learning burden is much lower, because you already know the API of the DBMS; you only have to learn the schema of the new data. If each database, rather than each DBMS, had its own API, even the most tolerant programmers would balk.

Implementing a purely entity-oriented API on the web enables HTTP to function as the standardized API for all web APIs, in the same way that the API provided by a DBMS functions as the standardized API for all the databases it hosts. HTTP becomes the universal API for all web APIs, and only the schema of the data of a specific API needs to be specified and learned.

Separation of interface from implementation

Separating an entity-oriented API model from the procedural implementation model has another major advantage—it makes it easier for each of them to evolve independently. This can be done in the procedural model too, by having one set of procedures for the external model and a separate set for the implementation. However, maintaining this separation when they are both expressed as programming-language procedures requires a lot of discipline and design oversight, and is rarely done well.

Why the extra effort is worthwhile

If entity-oriented web APIs are better, why is only a minority of web APIs designed this way? There are multiple reasons. One is that many programmers don't yet know how to do this, or they don't know why it's better. A second reason is that entity-oriented APIs require a bit more work to produce, because implementing an entity-oriented API requires programmers, whose code is in the form of procedures, to implement a mapping from the exposed entity model to the procedural implementation.

The mapping isn't inherently difficult—it consists mostly of implementing create, retrieve, update, and delete (CRUD) procedures for each entity, plus a set of procedures that implement queries on the entities. Many API developers start from this point but stray by exposing procedures that do not correspond clearly to a standard operation on a well-defined entity.

A third reason is that most of the popular API programming education, tools, frameworks, and examples illustrate the procedural style or a hybrid style—not a purely entity-oriented style. Staying true to the entity-oriented model requires a little more effort and mental acuity, but most programmers are neither lazy nor stupid; what is usually lacking is an understanding of why a little extra effort is worthwhile. In short, although it is not terribly hard, you have to have some vision as motivation to implement entity-oriented APIs.

The popularity of entity-oriented web APIs is increasing slowly. Some widely used APIs, like the Google Drive API and the GitHub API, are almost completely entity-oriented. Others have understood that entity-oriented interfaces can be constructed for almost any problem domain. I believe the industry will continue to move in this direction.

For more on API design best practices, read the eBook, “Web API Design: The Missing Link.

Image: Flickr Creative Commons/webtreats

Web API Design: The Missing Link

New eBook: Best practices in web API design

Attention all API developers—API design matters! It's important to think about design choices from the application developers point of view.  

Why? Because the app developer is the consumer of your web API and the linchpin of your API strategy. Your job as an API developer is to ensure that app developers can get started quickly and easily with your APIs, and maximize productivity and success along the way.

Getting the design right is important because design communicates how something is used. You should always be thinking to yourself: “What is the design with optimal benefit for the app developer?”

Our newest eBook on API design, “Web API Design: The Missing Link,” is a comprehensive collection of the web API design best practices used by leading API teams.

API developers will learn everything from the importance of taking a data-oriented design approach to API development to advice on designing representations and URLs.  

Ready to start crafting interfaces that developers will love? Download the eBook now.