So I recently saw a thread on Twitter about tracking viewers, obviously with the current debate on internet trackers and privacy this has new and exciting connotations! (no it doesn’t! -ED)
When we watch TV, our TVs watch us back and track our habits. This practice has exploded recently since it hasn’t faced much public scrutiny. But in the last few days, not one but *three* papers have dropped that uncover the extent of tracking on TVs. Let me tell you about them.
— Arvind Narayanan (@random_walker) September 27, 2019
So why do we track TV viewers? Back in the day – a while ago (~10 years now!) – I was working for a Cable company in Europe, and we were working on the ongoing question of how to determine viewership.
In general, the method for determining how many people and what segment of audience (we weren’t at personalisation yet) was watching a given programme was to use sampling based on people pressing a button as they watched TV. We’d have one of these people per some number of the audience… which could be very large say 1 per 10,000 (UK pop size is 60m+, so ~6000 people in your sample). This gives you a fairly large margin of error, and makes the make-up of the sample group critical to the
So here was the problem: we couldn’t determine with any real accuracy, who was watching what, and when. Unlike now, when people work on pay per click which is a more direct relationship, viewing figures informed programme development. If you had a programme out, you’d want to know who was watching it, and how popular. If you had a channel you wanted to know what type of programmes etc. You also wanted to know the times people watched, and if you could get it, who watched, what and when.
At a technology level, TV was broadcast… so one-way – you couldn’t send any data back if you wanted to, unless you had a modem or other “return path”. Prior to so-called “Smart TVs”, your best bet for a TV + data combo was a set top box (STB) from a provider. So the potential for getting the box to tell you what was happening was limited at best.
Coming back to Cable – this had potential as the only truly connected network, although again, initially the cable stream was generally one-way, and the “return path” was low-bandwidth, with the STB having limited processing power.
At this point in time we were also talking about the idea of “smart agents”: profiles attached to people that would “represent them” – finding things that the people would like based on their profile. We’ve retreated from this idea since, in part because those “agents” were originally supposed to be on behalf of users of systems with a commercial or consumer aspect – think Amazon, BBC, Netflix etc – and because of the abuse of the use of this data and the subsequent privacy issues. The idea that a technological hegemon would collect this data wasn’t there at the time.
So, we wanted to track users for a number of reasons:
- What did they watch – editorial/commissioning
- When did they watch – audience segmentation
- Where did they watch – localisation/audience segmentation
And yes – advertising. Our revenue was based on people subscribing to our network, so our advertising was split for promoting our own content (keeping you watching) and plain old advertising (the rest). However, there are a number of actors here: the network, the channels, and the programmes. The network has the ability to see into the whole infrastructure, the channels can only see the content they own, and programmes see nothing.
Did we track – not so much – we did use the SNMP stack for trouble shooting, but the idea of tracking viewers was more trouble then it was worth from a technical and regulatory standpoint. Because we weren’t necessarily beneficiaries of the advertising revenue for non-network assets, there wasn’t really a good commercial case either. This didn’t stop us from trying to figure out who was watching and increase “stickiness”, something that you’d see on Amazon in its “recommendations” in future years (I’m not totally convinced this is really working on their Prime offering).
Fast forward to a fateful day when we decided to create a new TV Guide based on the idea of search. At this point we had so many channels and thus programmes, not to mention a large VoD library (what we had prior to Netflix/Amazon Prime/Hulu/…) . Our issue was this – the “return-path” – our ability to use the data path to request was very limited – and sometimes – unreliable, and our plan was to replace the broadcast guide with a request based guide, however this had a not-inconsiderable risk that we would swamp the network and end up with non-responsive STBs. The risk mitigation: measure, instrument the STBs, find out when they were used, where, and when.
We duly created a version of the TV Guide that would record these items, and send them back. Our first requirement was that users could not be identified directly (in fact it was part of existing dutch law), so the IP addresses were not disclosed, and since we were not connected to billing, we could only look at the device, and we couldn’t keep the data more than 7 days (and given the volume, that was sensible in itself). Even with these restrictions we could see the viewing patterns: 2 peaks, one in the morning, one in the evening, evening being larger than morning.
We could also look at the request times, and look at response, and also see the network performance – we could even see if some of the network equipment was “flapping”, and use it for diagnostics.
So tracking can be useful, the issue is splitting useful from commercial, from malicious. Overlaying all of that is privacy. One big change that has enabled the significant increase in trackers is bandwidth. Using a conventional broadband connection means you aren’t limited in requests. Without tracking, determining audience reaction is lost which cuts out any editorial/commissioning feedback unless you make everything pay per view, so from a network/channel perspective, it makes sense. Where it stops making sense is using someone like Facebook as a broker for this information – they aren’t actually part of the content landscape but since Facebook and Google dominate the ad landscape – that’s what people are gravitating towards, even if it doesn’t make sense.
In summary we’ve always been trying to see who is watching, just now we’ve joined that to the unholy mess that is internet advertising and tracking. In the process we’ve lost who is getting what data and for what purpose, and also oversight since the broadcaster <-> state regulator relationship has been broken. We’ve also managed to forget why we want the data. Knowing who is watching isn’t a bad thing, capturing that data for use outside the network or channel is. As usual its who has the data, who owns it, and who controls it.
- Watching You Watch: The Tracking Ecosystem of Over-the-Top TV Streaming Devices
- Information Exposure From Consumer IoT Devices: A Multidimensional, Network-Informed Measurement Approach
- IoT Inspector: Crowd sourcing Labeled Network Traffic from Smart Home Devices at Scale