I’m back with another customTask tip, but this time I’m exploring some new territory. Snowplow just introduced their latest version update, which included (among other things) an adapter for processing Google Analytics payloads. Never heard of Snowplow? It’s a collection of open-source libraries designed to let you build your own analytics pipeline, all the way from data collection, through ETL (extract, transform, load), using custom enrichments and JSON schemas, and finally into your own data warehouse, where you can then analyze the data using whatever tools you find preferable. Everything is designed to run over Amazon Web Services, so you don’t need to invest in local server hardware or hosting services.

Snowplow pipeline - from https://goo.gl/X9Jfeo Snowplow pipeline - from https://goo.gl/X9Jfeo

In essence, it’s a full-service, do-it-yourself analytics solution. Snowplow has deservedly gained a lot of momentum over the recent years, as more and more companies have matured to the point where they want full control of their data. And I don’t just mean data ownership, but also things like controlling the aggregation schemas that have proven to be rather rigid in Google Analytics, and being in full charge when and how the data is sampled and normalized.

Anyway, at some point I’ll author a proper article about Snowplow - one that it deserves. This time I’m just going to show you how to setup the Google Analytics duplicator / tracker, so that you can start collecting hits in your Snowplow pipeline by simply leveraging the payload generated and collected by Google Analytics.

X

The Simmer Newsletter

Subscribe to the Simmer newsletter to get the latest news and content from Simo Ahava into your email inbox!

Tip 70: Duplicate Google Analytics payload to Snowplow

If you read the release announcement, you might have noticed that the release is essentially a Google Analytics plugin, which is easy to add if you’re using the analytics.js tracking snippet.

Unfortunately, with Google Tag Manager there is no reliable way to load a plugin in your Google Analytics tags. That means you’re left with clumsy workarounds, such as

  1. A Custom HTML tag which you use to load analytics.js and create a tracker with the plugin.

  2. Some customTask hack where you load the plugin mid-hit.

The first one is unwieldy because you would then need to have all your tags use the same tracker name if you wanted them all to duplicate the payloads to Google Analytics.

The second simply doesn’t work. Even if you do manage to load the plugin in the tracker, Google Analytics would not stop to wait for the plugin to be registered, but would simply send the hit before the plugin has had time to attach and modify the tracker object itself.

So in this tip, we’re just going to skip the plugin altogether, and replicate its functionality using customTask.

To make it all work, create a new Custom JavaScript variable, name it something like {{customTask - Snowplow duplicator}}, and add the following code within:

function() {
  // Add your snowplow collector endpoint here
  var endpoint = 'https://collector.simoahava.com/';
  
  return function(model) {
    var vendor = 'com.google.analytics';
    var version = 'v1';
    var path = ((endpoint.substr(-1) !== '/') ? endpoint + '/' : endpoint) + vendor + '/' + version;
    
    var globalSendTaskName = '_' + model.get('trackingId') + '_sendHitTask';
    
    var originalSendHitTask = window[globalSendTaskName] = window[globalSendTaskName] || model.get('sendHitTask');
    
    model.set('sendHitTask', function(sendModel) {
      var payload = sendModel.get('hitPayload');
      originalSendHitTask(sendModel);
      var request = new XMLHttpRequest();
      request.open('POST', path, true);
      request.setRequestHeader('Content-type', 'text/plain; charset=UTF-8');
      request.send(payload);
    });
  };
}

Then you need to edit every single Google Analytics tag whose data you also want to send to Snowplow.

At this point, if you haven’t done so yet, it’s a good idea to make use of the Google Analytics Settings variable. Instead of having to modify every single tag, you only need to make the necessary change (see below) in the GAS variable, after which you can add that GAS variable to all your Google Analytics tags. Useful!

Anyway, the change you need to make is under More Settings / Fields to set of your Google Analytics tags or the Google Analytics Settings variable. If you’re editing tags directly, you’ll need to check the “Enable overriding settings in this tag” option to see the More Settings fields. Here’s the field you need to add.

Field name: customTask
Value: {{customTask - Snowplow duplicator}}

Remember - the change needs to be done in all the Google Analytics tags whose data you want to fork to Snowplow.

Note! At the time of writing, only the Clojure Collector in Snowplow supports the Google Analytics adapter. Hopefully they’ll release support for the Scala Stream Collector soon, as it will give you access to that sweet, juicy Google Analytics real-time data! Make sure you follow the Snowplow discussion forum - it’s a good place as any to get information on the roadmap.

This is a pretty sweet addition to Snowplow, because it lets you operate with parameters and values that are familiar to you, if you’ve used Google Analytics before. It also lets you leverage existing Google Analytics tracking, so you don’t need to rewrite the tracking setup on your site just to migrate to Snowplow.