Podcast
Root Causes 279: ChatGPT Watermarking


Hosted by
Tim Callan
Chief Compliance Officer
Jason Soroko
Fellow
Original broadcast date
February 20, 2023
ChatGPT presents the potential problem of ChatGPT content being used and attributed to another source, such as a professional writer or a student. In this episode we discuss the idea of "watermarking" ChatGPT content, including stenography, randomness, entropy, and how to destroy the watermarks.
Podcast Transcript
Lightly edited for flow and brevity.
There’s another issue now, Tim, that I’d like to address on this podcast, which is if you really want to spell it out, it’s the idea that who actually wrote the content. Was it ChatGPT or was it someone else, a human being, and of course, one way of doing this kind of thing, it’s done already in artwork.
Well, Tim, enter the world of entropy and pseudo-randomness. And isn’t it amazing. So, Scott Aaronson, who actually was hired by OpenAI earlier in 2022. Artificial intelligence, safety and alignment I think is part of that person’s title, but Scott Aaronson is the person who is kind of talking a lot about this ChatGPT watermarking idea. And so, how do you watermark words, Tim? I mean if you think about artwork, it’s not difficult. Steganography is a whole topic that we might get into or have gotten into. I forget now.
You can find a poem where poets inserted their name in the poem because you take every 17th letter and, it spells out their name and stuff along that line. That sort of thing, I don’t imagine that’s the specific technique but that sort of thing is the kind of thing we are talking about isn’t it?
Well, the beauty is you can actually put all kinds of stuff into that without just shifting a few bits here and there allowing you within that perception of randomness which will not change the sound of that waterfall all that much and yet you can encode quite a bit of information within it. So how in the world do you do it with words is really what we are talking about here because the output – ChatGPT – it’s text. It’s text-based. There’s just words. Well, Tim, when you and I are writing words down on a piece of paper, typically if we are writing a blog post or we are writing an email to our colleagues, we are not thinking terribly hard about the other than the grammar and the ideas we are trying to convey, we are not thinking about the order of the words, the randomness of the words, etc., We are not thinking about it. I mean the closest you might come to thinking about the order of words would be if you are writing poetry or you are trying to get a particular pentameter.
If you are running SONETs you are thinking about these things but it’s not typical in human language other than in really structured language like poem. But ChatGPT is thinking about these things, perhaps unbeknownst to you even as the user of ChatGPT. And so the seeming randomness of the word placement in ChatGPT output turns out not to be random. In fact, it’s pseudorandom. And so therefore, Tim, it’s almost like when you are writing a SONET and the iambic pentameter of a song, a good old-fashioned Shakespearian SONET, it’s maybe the closest analogy I can come up with. ChatGPT is also coming up with it’s own patterning of words and I don’t have the algorithm to describe to you except to give you that analogy of the iambic pentameter of a SONET which is not, it’s not randomness. It’s a pseudorandomness. In fact, there is a discoverable pattern that once you know it, you can realize, oh, this must be a Shakespearian SONET or this must be a ChatGPT output because the pattern of the pseudorandomness is just right to be defined as whatever ChatGPT is actually defining it to be.
And go see our podcast about prompt injections that where we talk a lot about this. So, you are absolutely right in making the distinguishing sentence, Tim, about the really, really, we are not talking about the style; we are not talking about a turn of phrase; we are not even talking about the fact that maybe some of ChatGPT’s output might even be somewhat inaccurate or highly accurate or you might even ask it to be written in the form on a SONET. You can ask GPT to do that.
If it turns out it is something – and again, this is gonna be far too simple, but let’s go with something simple. The 7th word, the 77th word, and the 177th word all start with a letter A. Ok. Throw enough content, run enough content through there and eventually you gonna realize that every one of them has this pattern. And you are gonna say, ah, I betcha that’s a watermark.
I’m gonna point everybody just listening to this to an article. It’s title is, “How The ChatGPT Watermark Works And Why It Could Be Defeated”. It’s in searchenginejournal.com and if you want to look that up, this actually brings up a lot about what we are talking about here. And it talks about how you might think about defeating it. But just, we want to make you aware of what’s going on with ChatGPT and the fact that it is using these kind of digital obfuscation and digital watermarking techniques which are essentially they are not cipher, they are not encryption, but that they do rely on randomness and pseudorandomness.
You let ChatGPT give you a first draft and then you just go in and change every sentence. Every sentence in your own words instead, which is probably a lot less work that originally researching your article and when it’s done you’ve definitely destroyed the steganography for sure.
And then the other thing about that is unless you have a high degree of confidence that ChatGPT is getting everything right you would probably want to scrutinize this anyway to make sure it’s not saying anything dumb. And so under those circumstances, that’s not that much different from rewriting it to begin with.

