Digisemestr #7 (2019-04-06) – Web Analytics & Data in Digital Marketing

Hola! Hello! Guten Tag! This is my 7th post about Digisemestr and there are six more to go. See my first post to find out what it is all about. Please, let me know if there are any mistakes. Don’t let me broadcast nonsense!

This lesson is focused on Google Analytics and attribution modeling.

Jan – Web Analytics

Marketing without data? Impossible. Deciding only based on data? Stupid. Jan spoke about web analytics in general and then about Google Analytics.

There are many different fields and tools. Traffic analytics is only a measurement of what is happening on your website like the amount of traffic and where is it coming from. It works great with aggregated data, individual sessions not so much (Google Analytics and Piwik belong to this category). Session tracking with tools like HotJar and Crazy Egg enables to record individual sessions and solve specific issues. Ad systems provide data about impressions, clicks and audiences. Social analytics go deeper with info about their users. There are tools for A/B testing, personalization and recommendations too. Customer analytics work with the information we already know about our customers. By leveraging existing data about them (eg. where did they come from) we are able to predict for example what are they going to buy. RFM analysis segments customers based on recency, frequency and monetary value. One of our customers who buys from us at least once a week didn’t order anything for a month, let’s investigate!

Actionable insights – data are great but what action can you take based on a bunch of values in rows and columns? Actionable insights help us decide what to do, what action to take thanks to insight into the data.

Jan wants us to pay attention to the right metrics. No vanity metrics. Is there a large volume of traffic from PPC? Great, let’s buy more, bid more! But wait! How much revenue is generated thanks to PPC? Oh, no. Next to nothing. I guess PPC is not as great as I thought. Hmm..what about margin? And the total costs? Never analyze in isolation, always think about the larger picture. Use common sense.

Google Analytics

It’s free up to 10 million hits (each user interaction such as pageviews, events, etc). Premium version Google Analytics 360 is pretty expensive, however, it offers extra features like business intelligence and shows data without sampling. GA is de facto standard. Learn it once and use it everywhere!

There is confusion surrounding metrics. Different tools show different numbers. People have different ideas about what individual metrics should mean. Never compare “same” metrics from different tools – apples and oranges. Let’s look at a couple of metrics now.


As much as we would like these aren’t living and breathing people. One user is 1 browser on 1 device (1 cookie). If they use 3 browsers, they count as 3 users. Do they delete cookies? New user. And old user stops visiting you. Opening links directly through “embedded” browser within Facebook? You guessed it, it’s a new user! Anonymous window? New user. By now, you probably already know, this number is overinflated.


Intuitively we understand that this is how many times our website was visited. What if someone opens a browser, visits our website, visits a bunch of pages and then leaves for lunch? They come back in 40 minutes and continue browsing for an hour. Was that 1 session? No. Sessions are time-based. Default setting is 30 minutes of inactivity. Users need to visit pages or trigger events. Scrolling doesn’t count as activity. Furthermore, if they come from a different source (eg. through a search on a different search engine) they count as a new session. Same source / different campaign? New session. The amount of time they spend on the last page in the session (exit page) is unknown. Why? We can only measure the time between pages. Code for measuring this is only executed every time page is loaded. GA shows shorter average session duration than it actually is.

Bounce rate

Bounce is a session with 1 pageview. Bounce rate is a percentage of how many session did bounce. When users reload a page it no longer counts as a bounce. Those long single page websites need to use Google Tag Manager or Scroll Depth to change URL in order not to report every visit as a bounce. Bounce duration is measured as 0 s (since no other page is loaded, the code is executed only once, there is duration to measure between 2 pages). Bounce durations are included in the average session duration. It’s possible to segment session to exclude bounces. The average bounce rate of the entire website isn’t useful. Looking at it for individual categories or parts of the website is. High bounce rate isn’t necessarily a bad thing (80% is sometimes still okay).

Don’t look at HITS metrics! Why? ‘Cause that’s How Idiots Track Success.

What’s important is not to change the methodology. Don’t rely on specific/absolute numbers in GA. Instead, compare measurements and how they change in time relative to each other

There are a lot of different reports available in GA. Don’t be sad about not using them all. Each is suitable for a different type of website.


Real-Time section of GA doesn’t show how many people are browsing your web right now, only how many came in the last 5 minutes. It’s easier to gain actionable insights from long-term data so real-time doesn’t help much. It’s useful for a narrow category of problems. For example, has measuring code been deployed correctly? My commercial is on TV right now, does it have an immediate effect?


Cohort analysis (now in beta) doesn’t pair users with all of their devices. Demographics, geo and interests data are available. The latter is useful, especially in Google Ads. Note that 1 person can belong to multiple categories. Furthermore, not all could have been categorized (eg. anonymous browser window with no cookies).

Other sections in GA follow ABC (Acquisition, Behavior, Conversions).


The behavior flows are a pretty weak part of GA. Site content, on the other hand, offers metrics for every page. Check bounce rate for Landing pages, are people leaving you immediately? The landing page is the 1st page user visits on your website (eg. by clicking on a link on search engine results). Exit pages show last pages people visited on your website before running away. Exit rate is a percentage of how often people stop browsing at that specific page. Sometimes high exit rate is fine (eg. Thank you for your order page), sometimes it’s not (eg. Index page with links to other parts of the website). Site speed is very important. Google lowers QS for PPC and organic rank for slow websites. Do you have a search on your website? Site search might be useful. With it you can discover products people want or that they look for a different name for the product you already offer. Events track user actions such as scrolling or clicking, all except loading a page.


The acquisition is all about Source/Medium. Where are people coming from? From which source? Was it search? Was it a paid or organic medium?

Source: Search engines, social networks, (direct), etc.

Medium: Organic, CPC, remarketing, email campaign…

Tagging is a must. Tagging? It’s adding “utm_source” and similar stuff in the URL. Set up auto-tagging in Google Ads and in all the other places.

What about multichannel? What if people visit you a bunch of times from different sources Which source is responsible for their purchase? This is attribution.

[Paid Search > Organic Search > Referral > Email > Social Network > Direct ]

Attribution models decide which source should get the credit. GA is using Last Non-Direct Click Attribution.

Last Non-Direct Click attribution model

Sounds complicated, doesn’t it? What about simple Last Click? The last touchpoint receives all the credit for the sale. This includes direct. However, with direct, we either don’t know where they come from or they entered URL directly in their browser. How could direct convince people to visit us? It can’t. So seeing direct doesn’t help us much with evaluating campaigns. To which marketing channels should I invest? Unclear. Instead, we use Last Non-Direct Click. The sale is attributed to the last touchpoint. If the last touchpoint is direct, credit goes to the previous touchpoint. This attribution is not just for sales but also for sessions. Existing sessions from direct are gradually moved to a different source once they visit from there.

Acquisition sources stand at the beginning of the path. They are for attracting customers. Retention sources such as remarketing and emails are for keeping existing customers.

Milan & Ivana – Attribution modeling

Milan and Ivana took turns talking about conversion paths in GA and the depths of attribution modeling.

Each ad type/format is suitable for a different part of the funnel. The conversion path is created only after the conversion occurs. Conversion window tells us how many conversions (eg. orders) were there in a specific time frame. Lookback window is a maximum length of a conversion path. It should be set according to the expected time it takes the customer to make a decision. By default, GA doesn’t show paths which aren’t yet conversion paths (but eventually could become conversion paths). Enable it by implementing Client ID (custom dimension) via Google Tag Manager.

Attribution models

  • First touch – suitable for acquisition, brand awareness, you can convert them later
  • Linear – every source gets credit
  • Time Decay – more recent sources get more credit
  • Last Non-Direct Touch

In GA > Conversions > Attribution there is a Model Comparison Tool for comparing attribution models.

Data-driven attribution

It’s looking for correlations between touch points.

Shapley value

This is about removing touch points and observing removal effects on the conversion path. How much each source contributes? There is a nice analogy with card game players.

Source: https://clearcode.cc/blog/game-theory-attribution/

How does this apply for sources though? I like the official example from here:


Each touch point is a Markov state. Here is a great visual explanation of Markov chains. Probabilities of transitions between states need to be calculated. Higher order chains with longer “history” might be more precise. Removal effect is also utilized.

Choosing the most suitable attribution model

The first touch is great for startups as they are growing and getting their first users. If you don’t know which to choose, go with the default option, Last Non-Direct Click.

Observe the results of different models in time. Investigate and experiment if they diverge (eg. change bidding) and compare results.

And that’s all folks, see you next time.

Are you a reader who is difficult to satisfy?

Want some more Digisemestr?

Here you go:




https://romanluks.eu/blog/digisemestr-3-2019-03-09/ – PPC

https://romanluks.eu/blog/digisemestr-2-2019-03-02/ – SEO & Link Building

https://romanluks.eu/blog/digisemestr-1-2019-02-23/ – Introduction


Leave a Reply

Your email address will not be published. Required fields are marked *