Track Browsing Behavior in Google Analytics

Add Custom Dimensions for browsing behavior (tabbed browsing, how the user navigated to the current page) in Google Analytics. This guide is for Google Tag Manager users mainly, but can be adapted for other implementation methods, too.

Last updated 22 May 2019: Added expiration of currently open tabs.

In this article, Jethro Nederhof of Snowflake Analytics fame and I will introduce you to some pretty neat web browser APIs. The purpose of these APIs is to find out more about how the user navigated to the current page, and what’s going on with their browser tabs.

There are so many things you can do with this new information. You can build proper navigational path reports, rather than rely on the fuzzy and often incoherent flow reports in Google Analytics. You can identify how visitors interact with your content using browser tabs - a crucial bit of information if you want to make heads or tails of Google Analytics’ time on page metrics, for example. You can see how many redirects were involved in the current navigation action.

The origins of this article are in a number of places, including an impromptu challenge thrown by Yehoshua Coren in Twitter (with great contributions from Marek Lecián, too):

However, the main source of inspiration came from Measure Slack, where a similar discussion around tabbed browser was being had:

Jethro’s idea got the ball rolling in my mind, so I contacted him immediately and asked if he wanted to co-author this article. As it turns out, he’d been mulling around lots of different ideas for navigation behavior tracking, so we quickly sketched the outline of this article, and what you are reading now is the final product.

I consider the enhancements introduced in this article just as necessary as those I introduced way back when in my article Improve Data Collection With Four Custom Dimensions. Once you’re done implementing these scripts, I’m sure you’ll agree.

As said, this article is co-authored by Jethro and me. Since I am the editor, all mistakes and errors are entirely my fault, and any snarky comments should be directed at me and not my generous partner in crime (naturally, all snarky comments will be deleted, but I will read them and promise to be insulted).

You can skip directly to the Solutions chapter, but I do recommend reading the Theory first. At the very least, make sure you read the Caveats chapter, because there are a couple of gotchas you need to be aware of if implementing this solution.

Ah, damn it. Just read the whole thing, will you?

1. The why and how (theory)

The web, by default, is stateless. Your browsing behavior is confined to the current page only, and it’s difficult to programmatically peer into the past (never mind into the future).

Tools like Google Analytics try to make sense of this stateless mess by collecting information in a chronological order. You send a pageview, then you send another pageview. Google Analytics interprets this as navigational behavior where you first viewed the first page and then the second page. Makes sense, right?

But what about if you opened the second page in a tab and never even glanced at it? GA still interprets it as a page being viewed.

What about if you reloaded the current page? GA still interprets it as a navigational step.

What about if you pressed the back button of the browser to return to the previous page? Google Analytics makes no distinction - it doesn’t know if you clicked a link, typed the URL in the address bar, or used the back button. It’s all the same.

The flow reports try to make sense of this mess by aligning everything into a navigational pattern (the “flow”). But the reports are quite difficult to interpret, as they sometimes show sequences that you are certain shouldn’t be possible, lack necessary detail, and they don’t really give you a robust way to query the information or to build a proper flow report yourself.

And tabbed browsing is still a problem. Each tab initiates a new “session” in the browser (not to be confused with a Google Analytics session). Thus when GA tries to explain to you that everything is part of a linear navigational flow, the truth is actually more complex: each tab initiates a new branch of navigation!

Add to this the fact that in Google Analytics, “Referral” is a reserved term for campaign attribution. Thus when you are navigating around the site, the referring page is not sent when navigating in-site. This would be really helpful when deciphering navigational paths, since the Previous Page Path dimension does not necessarily represent the previous page viewed (just the previous page for which a pageview was sent). To make some sense out of the mess, sending the referral string in a Custom Dimension is a good practice.

There are other, more granular tracking solutions out there, of course. Snowplow, for example, lets you track hit-level data in any way you like. You can include referral information, if you wish. However, since you’re approaching Snowplow with the limitations of SQL queries, as soon as the navigation journey starts looping or has multiple pages, the joins you’d need to make become frustratingly complex and, as a result, quite unwieldy over large amounts of data.

So, we’ve had this problem of analytics tools imposing a linear sequence on a non-linear navigational pattern for a couple of decades, ever since tabbed browsing was introduced around the turn of the millennium.

What we’re proposing in this article is to add another layer to the taxonomy of Google Analytics’ rather rigid schema for browsing behavior.

FROM: Users > Sessions > Hits
TO: Users > Sessions > Tabs > Hits

This browsing behavior would be encoded directly into Custom Dimensions, so you don’t have to infer behavior from GA’s pageview model - rather, you can query the navigational information directly!

1.1. The PerformanceNavigation API

The API we’re going to use to identify browsing behavior is called PerformanceNavigation. It comes with two read-only properties: performance.navigation.type and performance.navigation.redirectCount.

The origin story of this API is firmly rooted in the difficulty of measuring performance of any given web page load. Redirects, network lag, and cached resources (locally, server-side, in CDNs) all contribute to the complexity of what the aggregate performance of any given page load is.

To help understand this complexity, web browsers implement the PerformanceNavigation API. The two properties exposed by the API can be found under the global window.performance.navigation interface, and they contain the following information:

Parameter Detail Description
redirectCount - The number of 3XX HTTP redirects the browser went through when loading the page.
type - An integer representing how the page was loaded.
  0 Normal navigation, typing the URL, clicking a link, opening a bookmark, entering via an app link, etc.
  1 Page was refreshed / reloaded while already open in the browser tab.
  2 The page was retrieved from the browser history by using the Back or Forward button.
  255 Any other way to navigate to the page.

The 255 value is interesting, because it is quite undocumented. In tests, it seems to only emerge with the Firefox browser, when the page goes through a client-side refresh, either using window.location.replace() or <meta name="refresh">. Note that this goes against the W3C specification, which clearly states that client-side redirects should be contained within one of the regular navigation types.

Now, even though we mention how this API was probably conceived to give more information about the performance timing metrics, where it really shines is uncovering the navigational paths the users take through your site.

1.2. Browser storage

Ever since browser tabs were introduced to the delight of site visitors and web browser users, they’ve been a source of annoyance for web analysts.

In Google Analytics, for example, we simply don’t know (by default) what the tab situation is of any given page. We don’t know if the page was opened in a new tab, for one, since you can’t attach listeners to the right-click context menu, and tracking just middle mouse button clicks doesn’t give a comprehensive idea of the scope of the phenomenon.

The beauty of web analytics is that when we can’t track things directly (whether a user submitted a form, for example), we can track them indirectly (page load of a “thank you” page). So maybe we can use this same indirect approach with browser tabs, too?

If we can assign an identifier to every tab the user opens in the current website, we can identify individual navigation sequences! This can be achieved by checking against a common data store whether the given tab ID exists already, in which case the page load happened in a tab that was already open, and vice versa.

However, thanks again to the statelessness of the web, there is the problem of persistence. We can’t simply use a global JavaScript property to store the tab ID, because that gets demolished when the user unloads the page by reloading or navigating to another page.

Cookies and localStorage are “too” persistent, because they are only reset when they are manually cleared (localStorage) or when they expire (cookies). We only want to store the ID of a tab for as long as that tab exists.

Enter sessionStorage! It’s a browser storage API that is unique to each tab in your web browser. By storing the tab ID in sessionStorage, we can always check if the tab already has an ID (existing tab), or if we need to generate a new one (new tab).

Combine this with some localStorage logic, where we keep a running tally of how many tab IDs have been generated, remembering to remove any tab ID once that tab is closed, and we can also get a fairly reliable count of tabs open on your website at any given time.

So, now we are getting close to having all the bits and pieces at hand. Once we combine the tabbed browsing metadata with that provided by the PerformanceNavigation API, we can get a nice set of Custom Dimensions that expose a great deal of interesting information about how your visitors navigate your site.

2. Caveats

Oh, there are plenty of caveats. The solutions below rely on a number of fragile components, which can get easily messed up due to how browsers act differently in certain circumstances.

2.1. Cross-domain browsing

If your tracking crosses multiple domains, then each domain will get their own tab ID. That’s because sessionStorage and localStorage are confined to the current domain only. And if your current domain sails between HTTP and HTTPS protocols, then those will get their own storages, too. You could fix this by using URL parameters on the domain boundaries to pass the metadata from one domain to the next, but this is something you’ll need to figure out by yourself.

2.2. Firefox and Chrome go rogue

An additional point of concern is that Firefox and Chrome do not respect the specification that session cookies and sessionStorage should only exist for as long as the tab is alive. Once the browser is closed, these session stores should be purged.

On Chrome and Firefox, however, if you open the browser and you have the Show your windows and tabs from last time (Firefox) or Continue where you left off (Chrome) settings turned on, the session stores are restored, too! So even though the tab did close, by restoring the tab it’s as if the tab was never closed.

That sucks. Not much we can do about that.

Also, and this is interesting, when you do restore tabs or continue where you left off, the browser claims that the navigation type was BACK/FORWARD, meaning it equates restoring tabs with using the back or forward button of the browser.

This sucks, too. And there’s not much we can do about this, either.

2.3. Firefox and its client-side refresh dilemma

Then there’s the problem with Firefox implementing OTHER as the navigation type if a client-side refresh is done. This is unfortunate, but doesn’t luckily mess things up too bad, because this behavior is contained within this particular navigation type alone.

In addition to all of these, because the whole granularity of distinguishing between BACK and FORWARD needs to be coded using a programmatically maintained path of pages the user has visited, all it takes is for them to manually clear the sessionStorage to break the whole setup (at least, for as long as the tab is open).

That’s why in the beginning of the solution you have the option of NOT tracking the back and forward button presses with detail. In this case, the string BACK/FORWARD is sent to Google Analytics in case either button is pressed. It’s not as detailed but it does still tell you whether the user navigated using these buttons (or, as mentioned above, if the tabs were restored after restarting the browser).

Our hypothesis is that with more data the averages will converge towards a more reliable dataset. So idiosyncrasies such as tabs being restored or users clearing storage will disappear into the data as more and more page views are accumulated.

2.4. Browser support

There’s also the matter of browser support, but unlike some online resources claim, support for PerformanceNavigation is pretty well supported in Chrome and Safari.

2.5. Single-page apps

You could implement this solution for single-page apps, but in that case you should add an additional, custom navigation type called VIRTUAL, or something similar, if the navigation was a single-page transition and not a proper page load.

Do note that window.performance.navigation.type does not update when the user uses the browser’s Back / Forward buttons and the transition is from one single-page app state to another. So you’d need to code the logic for Back / Forward use when no page load is recorded.

3. Solutions

OK, let’s get to the good stuff! In this chapter, we’ll introduce the technical solution in Google Tag Manager (Custom HTML tag) that orchestrates the whole thing. In addition to that, each sub-section will detail how to send the respective piece of information to Google Analytics as a Custom Dimension. Basically, all you’ll need to edit is your main Page View tag, since that’s the only one that will be sending this data to Google Analytics.

3.1. The Custom HTML tag

First, create a new Custom HTML tag, and add the following code within.

<script>
  (function() {
    
    // Set to false if you only want to register "BACK/FORWARD"
    // if either button was pressed.
    var detailedBackForward = true;
    
    // Set expiration of tab count in milliseconds. The recommended default is
    // 72 hours (259200000 ms). Set to 0 if you don't want to expire the tab count.
    var expireTabs = 259200000;
    
    if (!!window.Storage) {

      var openTabs  = JSON.parse(localStorage.getItem('_tab_ids')) || [],
          tabId     = sessionStorage.getItem('_tab_id'),
          navPath   = JSON.parse(sessionStorage.getItem('_nav_path')),
          curPage   = document.location.href,
          newTab    = false,
          origin	= document.location.origin;

      var tabCount,
          redirectCount,
          navigationType,
          prevInStack,
          lastInStack,
          payload,
          expiration,
          newTabId;

      var clearExpired = function(tabs) {
        
        if (expireTabs === 0) { return tabs; }
        return tabs.filter(function(tab) {
          try {
	        expiration = parseInt(tab.split('_')[1], 10);
    	    return expiration > (new Date().getTime());
          } catch(e) {
            return false;
          }
        });
        
      };
      
      var updateTabExpiration = function(tabId) {
        
        if (expireTabs === 0) { return tabId; }
        try {
          newTabId = tabId.split('_');
          expiration = parseInt(newTabId[1], 10);
          if (expiration > new Date().getTime()) {
            return tabId;
          } else {
            newTabId = newTabId[0] + '_' + (new Date().getTime() + expireTabs);
            sessionStorage.setItem('_tab_id', newTabId);
            return newTabId;
          }
        } catch(e) {
          return tabId;
        }
      
      };
      
      var getBackForwardNavigation = function() {
        
        if (detailedBackForward === false) {
          return 'BACK/FORWARD';
        }

        if (navPath.length < 2) {
          return 'FORWARD';
        }

        prevInStack = navPath[navPath.length-2];
        lastInStack = navPath[navPath.length-1];

        if (prevInStack === curPage || lastInStack === curPage) {
          return 'BACK';
        } else {
          return 'FORWARD';
        }

      };

      var removeTabOnUnload = function() {

        var index;

        // Get the most recent values from storage
        openTabs = JSON.parse(localStorage.getItem('_tab_ids')) || [];
        tabId    = sessionStorage.getItem('_tab_id');

        openTabs = clearExpired(openTabs);
        
        if (openTabs.length && tabId !== null) {
          index = openTabs.indexOf(tabId);
          if (index > -1) {
            openTabs.splice(index, 1);
          }
          localStorage.setItem('_tab_ids', JSON.stringify(openTabs));
        }

      };

      var generateTabId = function() {

        // From https://stackoverflow.com/a/8809472/2367037
        var d = new Date().getTime();
        if (typeof performance !== 'undefined' && typeof performance.now === 'function'){
          d += performance.now(); //use high-precision timer if available
        }
        return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function (c) {
          var r = (d + Math.random() * 16) % 16 | 0;
          d = Math.floor(d / 16);
          return (c === 'x' ? r : (r & 0x3 | 0x8)).toString(16);
        }) + (expireTabs > 0 ? '_' + (new Date().getTime() + expireTabs) : '');

      };
      
      var validNavigation = function(type, newTab) {
        
        // Return false if new tab and any other navigation type than
        // NAVIGATE or OTHER. Otherwise return true.
        return !(newTab === true && (type !== 0 && type !== 255));
      
      };

      if (tabId === null) {
        tabId = generateTabId();
        newTab = true;
        sessionStorage.setItem('_tab_id', tabId);
      } else {
        tabId = updateTabExpiration(tabId);
      }   
          
      openTabs = clearExpired(openTabs); 

      if (openTabs.indexOf(tabId) === -1) {
        openTabs.push(tabId);
        localStorage.setItem('_tab_ids', JSON.stringify(openTabs));
      }

      tabCount = openTabs.length;

      if (!!window.PerformanceNavigation) {
        navPath = navPath || [];
        redirectCount = window.performance.navigation.redirectCount;
        // Only track new tabs if type is NAVIGATE or OTHER
        if (validNavigation(window.performance.navigation.type, newTab)) {
          switch (window.performance.navigation.type) {
            case 0:
              navigationType = 'NAVIGATE';
              navPath.push(curPage);
              break;
            case 1:
              navigationType = 'RELOAD';
              if (navPath.length === 0 || navPath[navPath.length-1] !== curPage) {
                navPath.push(curPage);
              }
              break;
            case 2:
              navigationType = getBackForwardNavigation();
              if (navigationType === 'FORWARD') {
                // Only add to navigation if not coming from external domain
                if (document.referrer.indexOf(origin) > -1) {
                  navPath.push(curPage);
                }
              } else if (navigationType === 'BACK') {
                // Only clear from navigation if not returning from external domain
                if (lastInStack !== curPage) {
                  navPath.pop();
                }
              } else {
                navPath.push(curPage);
              }
              break;
            default:
              navigationType = 'OTHER';
              navPath.push(curPage);
          }
        } else {
          navPath.push(curPage);
        }
        sessionStorage.setItem('_nav_path', JSON.stringify(navPath));
      }

      window.addEventListener('beforeunload', removeTabOnUnload);
      
      payload = {
        tabCount: tabCount,
        redirectCount: redirectCount,
        navigationType: navigationType,
        newTab: newTab === true ? 'New' : 'Existing',
        tabId: tabId.replace(/_.+/, '')
      };

      // Set the data model keys directly so they can be used in the Page View tag
      window.google_tag_manager[{{Container ID}}].dataLayer.set('browsingBehavior', payload);
      
      // Also push to dataLayer
      window.dataLayer.push({
        event: 'custom.navigation',
        browsingBehavior: payload
      });
    
    }
  
  })();
</script>

Do not add any triggers to this tag. Instead, open your Page View tag, and add this Custom HTML tag into its tag sequence by making the Custom HTML tag fire before the Page View tag.

Accessing GTM’s data model directly with dataLayer.set is necessary if you want to modify Data Layer keys within a tag sequence. However, for the sake of transparency, we’ll also push a dataLayer object with a custom event (custom.navigation) and the browsing behavior payload.

The code in the Custom HTML tag does a number of things.

  1. It generates a tab ID for the page, which is stored in sessionStorage. If a new tab ID is thus generated, the page is marked as being in a new browser tab. If a tab ID was already in storage, then the page is flagged as being in an existing tab.

  2. This tab ID is also stored in an array within localStorage, where all tab IDs generated on the site are stored. The length of this array is the count of currently open tabs on the domain.

  3. If the user leaves the page, or closes the browser tab, or closes the browser, the tab ID is removed from the array in localStorage (thus keeping the count of open tabs accurate).

  4. If the page was loaded by navigating to the URL, navigation type is set to NAVIGATE, and the current page is pushed into an array representing the navigation path, stored in sessionStorage.

  5. If the page was reloaded, navigation type is set to RELOAD, and the navigation path is kept as it is.

  6. If the Back or Forward button was pressed, the script checks if the current page is the penultimate page in the navigation path, in which case navigation type is set to BACK. In other cases, navigation type is set to FORWARD.

  7. Count of tabs, count of redirects, navigation type, tab ID, and whether the tab was new or existing are all added to Google Tag Manager’s data model, so that the Page View tag can grab these values with Data Layer variables.

UPDATE 22 May 2019: I added some extra code to handle expiration. By default, the expiration is 72 hours, so if the storage isn’t interacted with in 72 hours, the current count of open tabs is cleared.

There are some nuances to the steps listed above, mainly to account for edge cases, such as when navigating BACK or FORWARD from an external domain. Even with these precautions, there are situations where the solution will fail (see the Caveats chapter).

And now that we are adding all this information into Data Layer, it’s time to pick up the metadata and send it to Google Analytics!

3.2. Create the Custom Dimensions in Google Analyitcs

First, you’ll need to create some Custom Dimensions in Google Analytics.

Create the following Custom Dimensions in Google Analytics’ Property Settings, all in hit scope, and make note of the index numbers assigned to them.

  • Redirect count
  • Navigation type
  • Tab type
  • Tabs open
  • Tab ID

3.3. Create the Data Layer variables in Google Tag Manager

Next, create the corresponding Data Layer variables in Google Tag Manager. Here’s my setup:

Variable name Value for Data Layer Variable name field
{{DLV - browsingBehavior.redirectCount}} browsingBehavior.redirectCount
{{DLV - browsingBehavior.navigationType}} browsingBehavior.navigationType
{{DLV - browsingBehavior.newTab}} browsingBehavior.newTab
{{DLV - browsingBehavior.tabCount}} browsingBehavior.tabCount
{{DLV - browsingBehavior.tabId }} browsingBehavior.tabId

Here’s an example of what one of the variables would look like:

3.4. Add the Data Layer variables to your Page View tag

Now, open the Page View tag which you have already edited for the Tag Sequence stuff in the beginning of this chapter. Either in a Google Analytics Settings variable or by directly editing the tag fields, add the Data Layer variables to their respective indices in the Custom Dimensions list. This is what my setup looks like:

3.5. Test it!

Now, save the tag, go to Preview mode, and enter your site.

Note! You can’t really use Preview mode to test if the Data Layer variables are populating correctly. Because you are updating the data model in a tag sequence, the method used does not expose the dynamic changes to the Preview mode user interface.

You’ll need to either look at the Network requests directly, or use a tool such as Google Tag Assistant recordings or Google Analytics Debugger to check if the data is being sent correctly.

In any case, if everything is working, then with your Page View tag you should see the Custom Dimensions being populated with the relevant information.

3.6. Custom Report in Google Analytics

A handy way to pull it all together is to create a Custom Report in Google Analytics with these settings, for example:

This report will contain interesting information about how users navigated to the different pages on your site.

Couple this with things like Session ID and Hit Timestamp, and you can really start digging into how users move from one page to the other on your site.

3.7. Other data stores

If you’ve taken the time to duplicate your Google Analytics data to Snowplow, you’ll naturally have access to a far more granular, raw hit stream resource for querying against. For example, here’s a simple SQL query output with all the relevant dimensions included:

Do note that Snowplow doesn’t differentiate between different scopes of Custom Dimensions, since scope is a concept applied in Google Analytics processing. So if you are sending an identifier into a session-scoped Custom Dimension (e.g. Session ID), the hit stream to Snowplow will interpret this as hit-scoped data, and thus the identifier is pretty useless.

4. Summary

First of all, I’m hugely indebted to Jethro Nederhof for agreeing to draft this solution with me. I really love the community of analytics developers - seems like everyone is giddy with excitement when figuring out new solutions to age-old questions, and the amount of knowledge being shared across blogs, Slack channels, and social media is a testament to the selflessness of these good men and women.

In my humble opinion, anyone interested in proper page navigation analysis should try out this solution. Understanding things like browser tab usage and Back / Forward browsing can help you figure out where the information architecture blind spots of your site are or whether you need to fix the navigation options you offer your visitors, for example.

But, as always on this particular blog, this is a technical solution first and foremost. Jethro and I wanted to highlight some cool tricks you can do with the web browser, and we fully expect others to refine these methods even further.

It would be cool if the APIs were developed even further, such as by automatically distinguishing between Back and Forward of the browser. Right now, they’re bunched together which is why we need the workaround of managing the navigation path in sessionStorage, and that’s a fragile solution indeed.

Let us know in the comments what you think of this trick, and whether you have suggestions on how to improve it!