11436 SSO

Behavior Graphs for Understanding Customer Journeys

Jan 22, 2015

Engaging with customers via the web, mobile devices, CRM systems, and product purchases or consumption generates a wealth of time-stamped data that’s both structured and unstructured. Drawing insights from these digital interactions is key to delivering individualized, contextual, and consistent experiences to customers, and requires both descriptive analytics and predictive analytics. Descriptive and predictive analysis tools go hand-in-hand: descriptive analysis provides a lens on past activities, while predictive models leverage the patterns uncovered by the descriptive analysis.

This post is the first in a series that will outline the technologies needed to analyze time-stamped event data from multiple channels. Here, we’ll focus on descriptive analytics, and the challenge of creating “customer journeys”—the sequence of interactions that occur between a customer and a business, spanning channels and devices. For example, uncovering the fact that a specific group of customers viewed products online before buying them at a physical store is more indicative of customer behavior than simple counts of web-site visits and store purchases.

Descriptive analytics and customer journeys

In the example below, the customer’s engagement spans email, web, social media, call centers, and physical stores. Data from each of the customer touch points is typically stored in a separate time-stamped data set; to enable customer journey analysis, one needs to enable queries that require joins of datasets, at web scale.

Points of customer engagement along a customer journey


Aggregating journeys across the entire population of customers creates what’s called a directed graph, with “nodes” representing events, and each node being associated with a set of users. For example, a health insurance company might define an event of interest as “claims greater than $10K” while a retailer might define an event like “products returned in stores.” Such a graph permits an analyst to examine nodes that precede or follow a specific node of interest.  

Graphs that represent sequences of events are termed behavior graphs. Note that behavior graphs differ from graphs that represent relationships such as friends and friends-of-friends (say, a social graph, which generally represents a snapshot in time).

Creating behavior graphs

Creating a behavior graph using traditional SQL-like approaches involves building a giant fact table, using user IDs, time stamps, event type, and attribute data. Different types of events (web site clicks versus call center data for example) have different schemas, and need to be transformed into a unified schema.  

Furthermore, calculating sequential relationships involves self joins in this table, and the queries are complex and hard to optimize. For example, to find the set of people who used an email offer, bought a pair of shoes, and then returned the shoes, we need a query that looks for records of people who received the email offer with a timestamp before they purchased shoes, and join that with records of people who returned shoes with a later timestamp. The SQL required to address this quickly becomes very complex and unmanageable.

To solve the problem of creating and querying behavior graphs, Apigee uses its GRASP (graph and sequence processing) technology. GRASP provides both the data structures for efficient representation of the multiple time-stamped event streams, and query support for examining event sequences.  

Because the query language understands the notion of a sequence of events, we can write the query as “find all users” or “find all paths” where the first node is the email offer, a later node is the purchase, and an ending node is the return. The query returns all paths in the graph that matched this pattern, and we can then do various kinds of counting and aggregates on this result set to answer questions such as what fraction of the people who bought shoes returned them, or what fraction of people who bought shoes using a discount returned them.

Among the many graph databases and languages like Neo4j, Titan, Gremlin, Giraph, Pregel, and GraphLab, none offer the support for examining the kind of behavior graphs that underlie customer journeys. The query languages underlying these systems will allow one to navigate neighbors in the graph, but it is still challenging to answer questions like those relating to sequences of events as described above.

Applications of customer journey analytics

Any company that deals with customer engagement data across multiple channels is likely using some form of business intelligence technology that builds segments of customers and associated actions. What they often lack is the information related to sequences of events.

For the population of people who returned products in a physical store, how many bought the product online, and what subset of those received an email campaign offer first? These kinds of questions are hard to answer using BI tools. This is where customer journey analysis shines a light on event patterns that business analysts can use to create reactive action plans. Telecommunications, healthcare, and financial services companies are at the forefront of such analysis today.

But to move from reactive actions to proactive ones, you need predictive modeling. Predictive models based on GRASP leverage exhaustive searches of customer behavior graphs to create very effective models amenable to machine learning. Coming up next, we’ll discuss predictive modeling.

Microservices Done Right

Next Steps


Resources Gallery