According to lawsuit filed in California, OpenAI used personal information including medical records, data on children and even accessed private conversations to train its AI models.
Not just ChatGPT, other tools such as Dall-E, Codex and Whisper were trained using data that was extracted in violation of privacy and security of real people.
ChatGPT responds to questions like a human being, writes essays like real people by emulating their experiences and even generates content as if it were penned by a historic figure. All of this comes from data that it has access to, and now its creator OpenAI has been accused of stealing personal information of real people, as per the lawsuit.
What does the lawsuit say?
The petitioners have remained anonymous since only their initials are mentioned in the 157-page lawsuit, but they have accused ChatGPT of posing a catastrophic risk. They have alleged that all that personally identifiable information was stolen from millions of people, to train the AI into being more human-like.
Basically OpenAI is accused of simply harvesting and using any piece of personal information that users provide on other platforms, without seeking consent or even approaching any individual. This means that ChatGPT and Dall-E are essentially generating profits based on the private lives of people who aren’t even aware of that.
The plaintiffs also mentioned that without the massive data pile, extracted unethically, OpenAI wouldn’t have been able to create generative AI that is bringing in billions in revenue. Physical location, chats, contact information, search history and even information from browsers had been taken without the knowledge of the users.
What do the plaintiffs demand?
According to the lawsuit, things get worse since OpenAI introduced its products to the market without even deploying the necessary safeguards to protect private data.
It calls for OpenAI to be transparent about its data collection methods, a compensation for the stolen information and an option for people to opt out of its data harvesting drive.
What is OpenAI’s track record on data privacy?
Before this reports have emerged that OpenAI also used data from YouTube, run by its rival Google, in order to train ChatGPT and other generative AI tools. The reports claimed that ChatGPT had secretly used YouTube since it is the single largest source of images, text transcripts and audio.
The allegations had come months after Google itself was accused of using data from ChatGPT to train its own AI bot called Bard.
ChatGPT had also been banned in Italy over data privacy concerns, as the government sought to prevent it from using the personal details of millions of citizens. But the ban was lifted months later, after Italian regulators were satisfied with the safeeguards that OpenAI had put in place.
But that wasn’t the end for OpenAI’s troubles, since Japan also issued a warning to the firm over data privacy concerns related to ChatGPT.
As for the lawsuit, OpenAI only states that it will collect email, payment information and name of its users whenever necessary. But the firm has never mentioned anything about the data sourced from other corners of the internet to train its model in the first place.