Root Causes 279: ChatGPT Watermarking

Hosted by

Tim Callan

Chief Compliance Officer

Jason Soroko

Fellow

Original broadcast date

February 20, 2023

ChatGPT presents the potential problem of ChatGPT content being used and attributed to another source, such as a professional writer or a student. In this episode we discuss the idea of "watermarking" ChatGPT content, including stenography, randomness, entropy, and how to destroy the watermarks.

Podcast Transcript

Lightly edited for flow and brevity.

Tim CallanSo, we recently have been talking a lot about OpenAI and artificial intelligence and ChatGPT in particular and how all of these things fit into the world of security. And so, there’s been an interesting development of late in the OpenAI/ChatGPT world that you recently described to me as being “right up our alley”. So, I thought, or we thought, let’s talk about it now. So, ChatGPT is a pretty remarkable, amazing thing the way you can just pose something to it and it gives you this really incredibly good content as a response and a lot of people have had their jaws dropping about how good this is. But this also presents a lot of problems in the world. Or potential problems in the world.

Jason SorokoIt sure does. We talked not long ago on this podcast about ChatGPT in the context of know your customer. We talked about a bit of a warning to all of you who are using ChatGPT because it just seems to be used by everybody right now and it might be tempting, very tempting, to be used for things such as looking up information on people and getting attributes - looking up information about entities of any kind and getting attributes - and the problem being that there is no guarantee of accuracy which is very important for attributes and also there is the issue that their might be things that need to be sorted out or ambiguities in your question or in the answer and ChatGPT just kind of rolls over that. So, we recently spoke about that.

There’s another issue now, Tim, that I’d like to address on this podcast, which is if you really want to spell it out, it’s the idea that who actually wrote the content. Was it ChatGPT or was it someone else, a human being, and of course, one way of doing this kind of thing, it’s done already in artwork.

Tim CallanIt’s cheating. What you are getting at, Jason, is it’s the problem of cheating. It’s the problem of me using ChatGPT and pretending it’s something I did. Imagine if I’m a writer. Imagine if I’m a student. But there are probably lots of other scenarios where this would occur as well.

Jason SorokoAbsolutely. It’s really not hard to imagine a student being asked to write a thousand word essay on something and just immediately going to ChatGPT saying, hey, give me a thousand words on a particular topic. I mean I wouldn’t even doubt if a lot of people are doing that just to get insight. Not even if it’s using the thing but just to get insight. But truly cheating would be using the text as is.

Tim CallanAnd to some degree, if you are saying it’s a research tool, I think that’s fair game. If I’m trying to learn about a topic, what’s the difference between my going and doing searches and finding papers that other people have written on the topic or going to find the Wikipedia page on that topic or asking ChatGPT as my starting point? If I’m trying to learn things. I think that’s just another form of research and in that regard I think it’s fine.

Jason SorokoTotally fine. We’ve already had Google searches. My grandmother’s house had Encyclopedia Brittanica for years and years. We’ve always had ways to look things up. This is just another way of doing it that provides context and even writing styles, which is amazing. But, Tim, the whole copy/paste into your essay problem and copy/paste into other things, too, even in the corporate world. Let’s say a teacher at a high school is evaluating papers. They can submit the text and say, hey, was this written by ChatGPT? And you might wonder how in the world would they know.

Well, Tim, enter the world of entropy and pseudo-randomness. And isn’t it amazing. So, Scott Aaronson, who actually was hired by OpenAI earlier in 2022. Artificial intelligence, safety and alignment I think is part of that person’s title, but Scott Aaronson is the person who is kind of talking a lot about this ChatGPT watermarking idea. And so, how do you watermark words, Tim? I mean if you think about artwork, it’s not difficult. Steganography is a whole topic that we might get into or have gotten into. I forget now.

Tim CallanI was just gonna mention steganography. To some degree, steganographic watermarking goes back thousands of years.

You can find a poem where poets inserted their name in the poem because you take every 17^th letter and, it spells out their name and stuff along that line. That sort of thing, I don’t imagine that’s the specific technique but that sort of thing is the kind of thing we are talking about isn’t it?

Jason SorokoIt really is. And so, you might think, well how do you encode something like a watermark into say a sound file or an image. Well, that’s actually - look it up. In fact, we should probably have another podcast about that’s done because it’s actually very interesting. And it’s a great way of infiltrating a message by a bad guy out of an enterprise. It’s a great way of doing it.

Tim CallanOr a great way of hiding content in other content. It’s like a picture of a landscape but if I know to look at the right sent bits I can find this other message, which is the thing I’m really looking for.

Jason SorokoSo almost a perfect analogy, Tim, is let’s say – and this is the example I’ve always used with people when it comes to steganography or embedding messages into a file. One of my favorite examples has always been a sound file or a video file that includes sound of something like a waterfall. Niagara Falls as an example. And so, if you had a video of Niagara Falls, which essentially is very close to something like white noise. If you really think about just the randomness of water falling. Just a perfect example of something approaching randomness.

Well, the beauty is you can actually put all kinds of stuff into that without just shifting a few bits here and there allowing you within that perception of randomness which will not change the sound of that waterfall all that much and yet you can encode quite a bit of information within it. So how in the world do you do it with words is really what we are talking about here because the output – ChatGPT – it’s text. It’s text-based. There’s just words. Well, Tim, when you and I are writing words down on a piece of paper, typically if we are writing a blog post or we are writing an email to our colleagues, we are not thinking terribly hard about the other than the grammar and the ideas we are trying to convey, we are not thinking about the order of the words, the randomness of the words, etc., We are not thinking about it. I mean the closest you might come to thinking about the order of words would be if you are writing poetry or you are trying to get a particular pentameter.

If you are running SONETs you are thinking about these things but it’s not typical in human language other than in really structured language like poem. But ChatGPT is thinking about these things, perhaps unbeknownst to you even as the user of ChatGPT. And so the seeming randomness of the word placement in ChatGPT output turns out not to be random. In fact, it’s pseudorandom. And so therefore, Tim, it’s almost like when you are writing a SONET and the iambic pentameter of a song, a good old-fashioned Shakespearian SONET, it’s maybe the closest analogy I can come up with. ChatGPT is also coming up with it’s own patterning of words and I don’t have the algorithm to describe to you except to give you that analogy of the iambic pentameter of a SONET which is not, it’s not randomness. It’s a pseudorandomness. In fact, there is a discoverable pattern that once you know it, you can realize, oh, this must be a Shakespearian SONET or this must be a ChatGPT output because the pattern of the pseudorandomness is just right to be defined as whatever ChatGPT is actually defining it to be.

Tim CallanSo now let me clarify on this because I think this is a very important point. There’s a difference between saying, look, there’s certain ways that ChatGPT tends to turn a phrase which are not the same as a human and if you see this then you should be suspicious that this is ChatGPT vs. saying there is a specific pattern that is deeply and probably likely to have occurred on accident. That ChatGPT will deliberately produce so if you know specifically what to look for you can prove to most people’s satisfaction that this was generated by ChatGPT. So, which of those two are we talking about?

Jason SorokoWe are absolutely not taking about turns of phrase that might look a little off or look automated or even look like something that you asked for from your prompt. Because you can do that with ChatGPT. You can say to ChatGPT in your prompt and, by the way, if none of you don’t know the punchline yet that is going to be the job of the future. A lot of people think humans are going way. Humans are actually going to be become programmers of the prompt.

And go see our podcast about prompt injections that where we talk a lot about this. So, you are absolutely right in making the distinguishing sentence, Tim, about the really, really, we are not talking about the style; we are not talking about a turn of phrase; we are not even talking about the fact that maybe some of ChatGPT’s output might even be somewhat inaccurate or highly accurate or you might even ask it to be written in the form on a SONET. You can ask GPT to do that.

Tim CallanYou can say use a lot of adjectives. And it will.

Jason SorokoAbsolutely. Make it sound like a blog post that’s in the style of Tim Callan. You can actually try that out. However, it’s not that. It has a lot more to do with just the way words are chosen at particular points within the text and like I say, I don’t have the algorithm in front of me. Maybe nobody does at this point other than people in their - -

Tim CallanThey shouldn’t share that algorithm because that would help you game in.

Jason SorokoYou got it. But that is how the watermarking works. It will be incredibly subtle, and it might not be something you notice with your eyes and just reading it. Obviously, if you were given the algorithm, you could go and look for it but probably it would be best to statistically look for it with another computer rather than even just trying to find it with your own - -

Tim CallanIf you knew what the algorithm was.

Jason SorokoExactly. But there it is.

Tim CallanInteresting. So if you’ve generated a vast quantity of content through ChatGPT and threw it into something that was doing some kind of computerized pattern matching maybe you could discover the algorithm.

If it turns out it is something – and again, this is gonna be far too simple, but let’s go with something simple. The 7^th word, the 77^th word, and the 177^th word all start with a letter A. Ok. Throw enough content, run enough content through there and eventually you gonna realize that every one of them has this pattern. And you are gonna say, ah, I betcha that’s a watermark.

Jason SorokoThat’s not difficult to do. If I asked you to write that as a code to me, Tim. Let’s say I had no means of knowing whether or not you wrote me a letter, but we agreed ahead of time that that was gonna be part of our watermark. It’s not even a cipher, it’s just a watermarking. This is a way of doing that and it would be very tough for other people to detect.

Tim CallanEspecially with a small amount of content. Especially without knowing what it is that they were looking for. You bet. Absolutely.

Jason SorokoSo, even though there can be sometimes very long texts, very short texts, it could just be another form of scale. Which is at the 20% mark and at the 80% mark at the text have the second letter of the word start with A. Which is not a difficult thing to do.

Tim CallanExactly. Get some things worked out. Build some things in there that are easy to fit in but at the same time unique enough that the likelihood that this occurred on its own is extremely low. And by the way, it wouldn’t even have to be perfect presumably. If I’m just trying to prevent plagiarism. It’s not plagiarism but cheating by using ChatGPT as a surrogate for my own work, if you get something that works 99.5% of the time you might say plenty good enough.

Jason SorokoI will tell you though. Can it be defeated? Because you can then add your own obfuscation to that.

I’m gonna point everybody just listening to this to an article. It’s title is, “How The ChatGPT Watermark Works And Why It Could Be Defeated”. It’s in searchenginejournal.com and if you want to look that up, this actually brings up a lot about what we are talking about here. And it talks about how you might think about defeating it. But just, we want to make you aware of what’s going on with ChatGPT and the fact that it is using these kind of digital obfuscation and digital watermarking techniques which are essentially they are not cipher, they are not encryption, but that they do rely on randomness and pseudorandomness.

Tim CallanSo one obvious way to defeat it would be to rewrite it.

You let ChatGPT give you a first draft and then you just go in and change every sentence. Every sentence in your own words instead, which is probably a lot less work that originally researching your article and when it’s done you’ve definitely destroyed the steganography for sure.

Jason SorokoAbsolutely. And so what I would recommend for those of you who are interested, OpenAI is probably gonna release those checking tools at some point. It would be very interesting for those of us who are interested in this subject, namely, students who might just want to test, well, how much ChatGPT text do I have to change before I come through the test as clean.

Tim CallanBecause if it’s just a matter of putting on new sentence in the beginning then that’s easily defeated. If it’s a situation of rewriting every single sentence, yes, maybe it saves you some work but how different is that from looking on the Wikipedia page and rewriting every one of those sentences.

And then the other thing about that is unless you have a high degree of confidence that ChatGPT is getting everything right you would probably want to scrutinize this anyway to make sure it’s not saying anything dumb. And so under those circumstances, that’s not that much different from rewriting it to begin with.

Stay informed with expert insights

Subscribe to Root Causes for engaging discussions on PKI, digital security, and best practices for protecting your organization's critical assets. Don’t miss an episode!