Podcast
Root Causes 227: Let's Talk About Cookies


Hosted by
Tim Callan
Chief Compliance Officer
Jason Soroko
Fellow
Original broadcast date
May 27, 2022
In this episode we explain the fundamentals of cookies and why, despite their obvious benefits, they present troublesome privacy concerns. We discuss the many ways web users can be tracked including cross-site cookies, tracking pixels, and browser fingerprinting.
Podcast Transcript
Lightly edited for flow and brevity.
In 1994, you went into Netscape and so, why are they called cookies? Think about a fortune cookie. That’s actually where the idea comes from. Which is the idea that it’s something where a message is embedded or a piece of data is embedded. Just like a fortune cookie.
That’s where the term apparently actually came from. The idea of a cookie or a magic cookie, which is what it was called, the idea of a piece of data, a packet of data, that was created, sent and received without changing it, was actually used in Linux for other purposes even before that idea was brought to Netscape and then became the cookie that we know and love, and sometimes not love, today. That’s kind of the history, and I think that’s important to make people understand, but what, there’s a lot of types of cookies. Now they all are formed very, very basically. A cookie contains a name, the cookie’s name, and then there’s a value, and all cookies are based on these pairs of name value, name value, name value. And, of course, you can have a cookie name like just some block of text, and then the value could be – this is the domain that you're at, or this is a piece of data that’s hashed, something or other that’ll allows us to understand from a database entry that we’ve just created that something is in your shopping cart. Therefore, let me just spell it out. Cookies were put into browsers because the World Wide Web essentially is stateless. Every time you make a request of your browser, as a client makes a request to a webserver, it is essentially stateless if you go down far enough into the technology. In order to have something like a shopping cart, which means you’re browsing away, and it remembers the fact that you’ve chosen certain things or the fact that you’ve - and this is where it gets close to us, close to our hearts, Tim – authentication.
So, there are many other types. That’s a non-exhaustive list, but I think those are the ones interesting to us. I think what’s important now, Tim, is to just recognize. I don’t think it’s the normal session cookies giving cookies a bad name. It’s those persistent and third-party cookies that are problem.
I’m going to get into two other topics that are related to cookies in a moment, but we’re going to talk about what the browsers have done because if cookies are going to be that difficult and a lot of people are just going to be rejecting all because of the fact that the European laws are going to change such that they have to make it easy to reject cookies, and therefore, maybe most people or enough people would reject all so that their business models might change. What is the industry reaction?
Tons of business models are based off of this. So, we do not want to throw out the baby with the bathwater. In order to make this podcast even juicier, Tim, and not make it just about something just so mundane like cookies, I want to talk about two more topics quickly.
And that is, there are other technologies out there that do similar things - -
So, tracking pixels to me, I first became aware of that when I was actually helping out a marketing department way, way, way back in the day, and the idea was if you open up an e-mail there’s a snippet of code in the HTML of the e-mail that actually downloads a single pixel. That single pixel was defined as such that the request of that download would notify the marketer - -
Let’s talk about, and this is the final bit here, Tim. Let’s talk about browser fingerprinting for a moment because I don’t think people realize just how deep this goes. Even if you got rid of cookies, browser fingerprinting, the science of it has gotten to the point where you could even be using a banner ad blocker. An advertising blocker extension, of which there are many extensions, in your browser and you could still come up completely unique and be able to be fairly uniquely identified. Let me give you an example. A lot of people think that a browser profile is your IP address. That’s one, maybe one data point. It’s not even one of the great ones. A lot of people who are technical enough to know that when your browser makes a request, there’s something called a user agent that you actually give to the website that you're browsing to, and it’s part of the request header, and that’s read and it contains information like you’re using Safari, or you're using Chrome, what version, etc. There’s a user agent. But again, that’s not terribly, terribly unique. You and I, Tim – if we were both using Safari on the same day, we might have a very similar user agent, especially if we were all up to date, etc. Keep in mind, when you are browsing, you are also giving up information such as what’s your primary language set to? In my case, it would be English; what’s my time zone; what’s my operating system? But what a lot of people don’t realize is, well, there are things such as your screen size, your color depth, your system fonts, because web servers want to know how to serve you information, and you're giving up that information through your browser. Additionally, there’s something called a DNT header that you might, as part of your request, say – do not track me. Believe it or not, that also is a – I’m going to introduce this idea, Tim – a bit of entropy that uniquely identifies you because not everybody has the DNT header on. I do, but a lot of people don’t. And that actually more uniquely identifies me.
I’ve got one more little piece to this because I’m going to show you how deep it goes. There’s something out there called a Fingerprint JS Library, short for Fingerprint JavaScript Library. What this does, it actually runs in the background, kind of unbeknownst to you, it’s almost like that single pixel idea. It will run an HTML 5 Canvas or a Web GL function, which essentially is, I’m sure you’ve been to websites, Tim, that look really elaborately fancy and beautiful graphics, etc. Well, if that were cut down to its absolute minimum, HTML 5 Canvas and Web GL also will give a webserver information about your GPU, your graphics drivers. Information that ultimately becomes hashed as what’s known as the canvass fingerprint or the Web GL fingerprint, all the way down to, like, your graphics driver version number. It’s incredible.
So, a bit of shoutout to them just because I found it very fun to go to that site because they’ll tell you how you unique you are with your browser. They’ll tell you whether you trackable. They will calculate your entropy based on your computer browser combination, and they’ll tell you the amount of bits of entropy that you actually are giving out publicly when you are browsing to websites. So very interestingly, when I did this, Tim, it told me that out of the greater than 200,000 browsers that the EFF had tested in the past 45 days, I was completely 100% unique and therefore, utterly trackable, which I found very interesting.

