ChatGPT and Data Privacy

Hello everyone and welcome back to the Cognixia podcast. If you have been following us for a bit, you already know that we are doing a special five-part series on one of the hottest topics in town right now – ChatGPT. This is the third episode in the series, so if you are tuning in for the first time, we highly recommend going back and listening to the first two episodes in this five-part series before beginning this episode. And if you have already been keeping up, welcome to today’s episode.

If you remember, at the closing of last week’s episode, we mentioned that this week we will be talking about the data privacy aspects of ChatGPT – is it a knight in shining armor of data privacy or a data privacy nightmare?

Well, actually we would definitely like to know what your assessment has been so far, do you think ChatGPT is all sorted when it comes to data privacy and security, or does it have some loopholes or maybe glaring loopholes staring right back at you? Has anybody had any experiences that got them concerned?

Yeah, that would be interesting to know, if only we could have a conversation with our listeners during a podcast… Now that would be so much fun!

If only, eh?

Totally. But well, someday maybe…

So, back to our topic for today – ChatGPT and Data Privacy. It is said that if you have ever written a blog, a product review on any platform, commented on anything online, or just about written anything on the online space, chances are your words were used for training ChatGPT. ChatGPT knows what you wrote, irrespective of whether you wanted it to know about it or not. By now, we have told you a few times about the humongous amount of data that went into training ChatGPT. As a thumb rule, the more data used for training a large language model like ChatGPT, the better the model would be at identifying patterns, predicting what’s next, and delivering plausible text.

Just for a quick refresher, we would like to remind you that more than 300 billion words existing all over the internet have been used to train the ChatGPT model. This includes books, articles, blogs, websites, posts, comments, and so much more. Now, it is a good thing that the model has been trained on so much data.

Yeah, so, what’s the problem if ChatGPT has been trained on all this extensive data and information available on the internet?

There are multiple problems with this, actually.

First, nobody was asked for permission to use their data for training ChatGPT. This is basically a violation of people’s privacy. If I have a blog I write on, anybody using the content of the blog for any person is ideally required to seek permission from me before using the content. OpenAI did not do this for any of the data it used to train ChatGPT. Some of the data used could be sensitive in nature for a multitude of reasons, and it is clearly a data privacy violation here. Plus, a lot of the content in this mix that ChatGPT has been trained on would be copyright-protected or proprietary information. For instance, you ask ChatGPT to pull up some paragraphs from some novel, and when ChatGPT delivers those paragraphs, you know that that content is copyright-protected and cannot be reproduced without the permission of the publishers. But ChatGPT and OpenAI have not considered this, we think.

We also have no information on whether ChatGPT is storing our data or it is deleting it once we log out, how it handles the information we enter or the queries we ask, etc. Now, for instance, Europe has a European General Data Protection Regulation (GDPR) in place, but we are still not sure if ChatGPT is GDPR compliant.

I think ChatGPT should have a ‘Forget’ option or something to exercise a ‘Right to Forget’. A lot of times we encounter very weird, meaningless, or completely inaccurate information from ChatGPT. So, if there was an option to delete that entry, it would help ChatGPT become more accurate and help with data management, I guess.

Absolutely! Plus, I think we as users might be handing over sensitive information to ChatGPT without realizing it. This data then ends up being in the public domain and poses a definite data privacy risk. Say, you are a lawyer who asked ChatGPT to draft some legal agreements for you or you are a business executive who asked ChatGPT to draft some emails to your clients for you or you are a developer who asked ChatGPT to check some code for you. In all these cases, all your data – being drafted or being checked, now goes into the public domain as ChatGPT accesses it. Now, these are undoubtedly confidential pieces of information and one can’t afford to have any leakage in them. But you inadvertently gave ChatGPT access to it, by extension putting it into a freely accessible public domain.

More alarmingly, according to OpenAI’s privacy policy, the company collects users’ IP addresses, browser type & settings, as well as user data around their interactions with the site, such as the type of content they engage with, the features that the users use, as well as the different actions they take online. To make it clearer, if you are a ChatGPT user, OpenAI is collecting all the data about your online browsing activity over time as well as across multiple websites.

What’s more alarming, in my opinion, is the fact that OpenAI has stated that it may share the user’s personal information with unspecified third parties, without actually informing them, to meet their, that is OpenAI’s business objectives.

Now, that is definitely scary. A lot of websites and platforms do that already, considering how many places we are asked to share personal information without being given a choice, data privacy is taken quite lightly as it is. The privacy risks that come with using ChatGPT sure seem a little concerning.

I think we could all do with being mindful of what we share with ChatGPT. I would also recommend making sure we log out once we are done using the tool.

Makes sense, that makes sense.

Yeah, so I think this is what we had to tell you about the data privacy concerns associated with ChatGPT. We hope we were able to help you get a different perspective on things in this episode.

Once again, this is a five-part series we are running on the Cognixia podcast about ChatGPT and if you have not caught up with the previous two episodes, please go down the list and make sure you give them a listen, they are super insightful and will be helpful.

And, with that, we come to the end of this week’s episode of the Cognixia podcast and the third part of our special series on ChatGPT. We promise to be back again next week with another interesting episode on ChatGPT on your one-stop destination for everything new about emerging digital technologies.

Meanwhile, if you would like to know more about upskilling and talent transformation opportunities in emerging technologies, please check our website – www.cognixia.com. You can connect with us on the chat function there and we will answer any queries you might have right away.

Until next week then, happy learning!