
If we are what we eat, Generative AI is you.
Vanessa Piper, Principal Explorer
17 October 2024
“From all of our interactions what is one thing that you can tell me about myself that I may not know about myself…”
This query apparently originated in Tiktok, which isn’t surprising considering the repeated use of the word “myself”, but it reached LinkedIn and other social media with so much rapid multi-generational popularity that the original Zoomer creator is likely cringing slightly and wishing they just hadn’t.
The point of the exercise is to enter the prompt into ChatGPT, then stand back and be awed at how well Generative AI knows you; and how nice it is about you.

Jokes aside, those pasting their GPT responses into social media as a way of either “showing people what a powerful yet intuitive tool” AI is, or “telling you some more about the best aspects of me, without actually saying it myself” seemed both flattered and occasionally overwhelmed at the personality insights ChatGPT has been able to gather.
It really does beg the question though, doesn’t it?
You know now how much of yourself you have given a generative AI chatbot. How much of this has been passed elsewhere?
And should you care?
The power! The absolute power!!
For science, I did a bit of poking at several GenAI platforms to see what they claim to be training on, their privacy options and just how accessible these controls are for the users.
For the sake of brevity, let’s stick to the most popular and accessible GenAI platforms, specifically ChatGPT (OpenAI), Copilot (Microsoft), Gemini (Google) and Anthropic (Claude).

I’m hearing a lot of “minimal” and “minimised” here (specifically for paying customers) and not a lot of “oh goodness no, we would never!”
Desperate times call for desperate measures, my lord.
When looking at data privacy (hugely murky waters) there are a few elements for your consideration. None of them are pretty.
- You pay for Privacy. For free GenAI models, as with all “free” services, you are the product. Once you start paying for your personal or enterprise chatbot usage, we move into “minimal” territory, rather than “lol your data is ABSOLUTELY being collected” land.
- Wait, what? Oh yeah. While paid models often provide more privacy control, especially for enterprise clients, there’s typically no absolute guarantee that zero data is shared. The official reasons for this are things like –
Quality Control & Improvement – even paid models may retain a subset of interactions for model refinement or QA. This is often anonymised & aggregated, but it’s still used. Compliance and “Legal Flexibility” – my goodness, I love that term! So much better than “loophole seeking”. Keeping it “minimal” gives companies the flexibility to interpret and apply privacy laws in a way that can assist them to uh, get around little complications like GDPR and CCPA. If anonymisation meets regulatory standards, they will collect and anonymise your data. If anonymisation isn’t required by regulatory standards, it’s a free-for-all data-hoedown and your 3am “why don’t I just ask GPT to interpret my messed up dream” chats are definitely invited. - Trying to find a Noodle in a scraping Haystack? The current models have already been trained on such vast datasets scraped from the Internet that this may not be something to lose sleep over. Because if privacy is something you were concerned about, you’d have been screaming for a very long time. It still pays to keep a critical eye on privacy policies as they change and evolve, particularly as the types of data shared and the applications become more personal, and more sensitive.
Get your blasted beak out of my face!
Let’s summarise this, shall we? And yes, I know this sucks.
- Our data was scraped from pretty much everywhere, in unfathomable amounts, to train the original chatbots/LLMs.
- Our data is now scraped (however anonymised, depending on what you choose and how much you can pay for it) to train the next generation of generative AI.
- So many major platforms and companies now are involved in selling user data to AI companies for training purposes. These will traditionally sell user data to advertisers anyway, so it’s no great shock.
A few examples of this:
Reddit – Sold data to Google, especially valuable due to its vast user-generated content. YTA.
LinkedIn – While specifics are guarded, it’s been reported that LinkedIn has shared data for AI model improvements, and they are certainly collecting.
Meta (Facebook) – Leveraging data for its own AI models, with potential partnerships for broader generative AI training.
Tumblr & WordPress – Automattic, their parent company, has been involved in deals with companies like OpenAI and Midjourney.
Twitter/X – Known to have collaborated with various AI firms to improve natural language processing models.
Other implicated companies include Snapchat, Amazon, Pinterest, Quora, Spotify, YouTube, TikTok, Medium, and GitHub – all the usual suspects, all of whom have either collaborated directly with AI firms or used user data to train/improve AI models.
Bear in mind, these are the major companies selling shovels to the gold miners, but they’re also just the tip of the mixed metaphor iceberg. As you probably know by now, I don’t like offering problems without solutions, but data privacy in today’s world is particularly messy. When we reach this stage of “oh no, there is no privacy if we use the internets!”, there are usually two options open to you.
Option 1 – Phenomenal Cosmic Power .. itty-bitty living space
The internet is a heck of a tool. For those of us who remember the dreams of the 90s, it really has become instant information at the touch of a finger, at the speed of light.
You can find any information/misinformation/disinformation you need to prove a point while arguing with a complete stranger, or confirm and nurture a long-held bias. You can listen to or watch anything, provided you don’t mind catering to that platform’s advertising algorithm.
The only thing that moved faster than our hunger for knowledge, was other people’s hunger to sell shit to us, or sell our shit to other people.
And yet, despite knowing this, continuing to feed our data to despicably dishonest websites is a decision that we are often forced to make, if we wish to remain competitors in the modern rodent rally. After all, if we wish to be employable, be visible, be connected with Auntie Mabel in Georgia, be entertained; we are required to give our details to those companies selling it to advertisers and data collectors.
We have been sold the concept (and let’s face it, most of us bought it) that taking the “terribly bad” with the “dubiously good” is just a part of the price we must pay for the “connectedness” we feel we should have while using the technology we have access to. Therefore, we have to be careful how precious we get about it.
I mean.
We did “give” our data, after all.
Option 2 – The venue chosen is the ends of the Earth, whoopee!!
Opt out where you can.
Be aware of what information you’re giving even platforms you think you can trust.
Fact check your data, AI hallucinates, and the Internet is full of opinions disguised as facts.
Don’t ask ChatGPT about you; ask someone who actually knows you.
Read books. Wake up at 3am with a messed up dream that you want to understand? Go to the library and read about dream interpretation. I’d still avoid those “having a dream where you’re trying to locate a weirdly open-plan toilet in a crowded railway station is a sign that angels are wat
ching over you!!!” books though.
Go see a local play or a movie rather than binge watching streamed content, regardless of what the cool kids in the office are raving about.
Pick up a hobby. Your pottery, martial arts and, heaven forbid, hiking clubs are going to be a lot more ‘connection’ than you’d think, and you’ll walk away with practical skills.
Write Auntie Mabel a letter. Go on.
And you know. Go live off-grid in a pristine 3.4 hectare section with bore water, substantial rain tanks, several green houses and those frickin’ adorable fluffy highland cows that everyone now seems to have photos of in their living rooms, now that “Live, Laugh, Love” has been deemed too basic. Sell free-range eggs at the local Farmers Markets along with your clay pottery, trade in green dollars and .. uh.
Breathe.
