Solving SEO Problems with API Design
Last year I visited the software development arm of household-name retailer that was attempting to rebuild the user interface for their main website using a single-page application (SPA) implementation design. The project had failed.
It did so for for two main reasons: page load times were unacceptably long, and search engines were unable to effectively index the new site.
At the time of my visit, the retailer had abandoned the SPA design approach in favor of "old-fashioned" HTML construction on the server. This is a pity, because a properly-designed and -implemented SPA could have provided a superior experience for the company’s users, and superior productivity for its developers.
There’s a lot of advice on the web on how to optimize load times for single-page applications, but less on how to deal with the problem of search-engine indexing. This post explains how the search engine indexing problem can be solved through thoughtful design of APIs—perhaps not the place many people might look for a solution to a search engine problem.
SPAs enable end users to navigate between different entities without performing an HTML document load for each entity.
Note: HTML document loading is a concept fundamental to all web browsers—it’s precisely defined here and in other specifications. "Single-document application" would have been a better name than single-page application, if you value consistency with the terminology in the specifications.
The SPA experience
There are many reasons for the popularity of SPAs: they are easy and fun to write; they can provide a superior user experience; and they help provide a clean separation between user interface code and business logic.
Another important advantage of SPAs is that their overall design is similar to that of mobile apps that display the same data and access the same function—in fact many people use the same HTML5 SPA implementation for both web and mobile.
Early SPAs often failed to integrate well with the browser. One of the most visible mistakes was a failure to update the browser history appropriately. As the user navigated between resources, the SPA failed to update the address bar and the browser history, resulting in providing nothing for the user to bookmark and the back, forward, and history controls of the browser not working.
Better understanding of how a SPA should be written along with adoption of frameworks like Angular that help developers write good SPAs have resulted in more and more SPAs that integrate well with the browser. Yet there are still few SPAs that also work well with search engine optimization.
HTML5 improvements aren’t enough
If you look in the web browser address bar of a typical SPA, you will see addresses that look like this:
Note: technically these addresses are called URI references, not URLs—the URL terminates at the # character.
There is only one HTML document for search engines to retrieve—https://acme.com/—and it usually contains only code or references to code, which is not useful to a search bot. This completely defeats search engine indexing.
With the ubiquity of the HTML5 history API in browsers, SPAs can now be written to use URLs like this one instead:
This is an important improvement, because there is now a separate URL that is retrievable from the server for each entity that is exposed by the SPA. Although this is a step in the right direction, this does not by itself solve our SEO problem, because the resources at these URLs—like the single resource we had previously at https://acme.com/—typically only contain code or references to code, as illustrated in the following example. This is still useless to a search bot.
Add some meta
Consider the example of an application that displays information about Disney cartoon characters. In order to make the HTML useful for a search bot, we can simply add additional information, like this:
<meta name="description" content="Personal information about Mickey Mouse" />
<meta name="keywords" content="Mickey Mouse Donald Duck Minnie Goofy Disney" />
<div style="display: none;">
<p>name: <span>Mickey Mouse</span></p>
<p>girlfriend: <a href="https://acme.com/entity/Minnie">Minnie Mouse</a></p>
<li><a href="https://acme.com/entity/Donald">Donald Duck</a></li>
All we did was add some <meta> elements to the head of the document (search engines take note of meta elements) and a simple <body> that will never be displayed to the user.
Obviously, this is not a sophisticated example of what you want to include in your HTML for SEO—this is not an SEO tutorial, which I'm anyway not qualified to write—but it shows how you can include HTML that is visible to search engines for all the entities that are displayed in your SPAs.
In summary, there are two steps required to solve the SEO problem for SPAs. The first is defining and using separate URLs—ones without fragment identifiers introduced by the "#" character—for each entity shown in the user interface.
The second step is providing meaningful HTML bodies for each entity, even though that HTML will not be seen by human users of the SPA.
This post has outlined a basic approach to make SPAs indexable by search engines, but we have not yet linked the story to API design. We'll do that in an upcoming post.
For more on API design, check out the free eBook, "Web API Design: The Missing Link."
Image: Flickr Creative Commons / ECP