OpenAI Accidentally Deletes ChatGPT Training Data Amid Publisher Copyright Claims, Sparking Concerns Over Evidence Retention In Legal Cases

Nov 22, 2024 at 01:29am EST
OpenAI deleted ChatGPT training data after publishers filed for copyright violations

OpenAI has been in a bit of controversy with the press, as The New York Times and the Daily News have sued the AI giant and its investors, claiming that ChatGPT was trained using their copyrighted content. The lawyers' research data that went into training AI models was deleted by OpenAI engineers, supposedly by accident. The move potentially deleted the evidence The New York Times lawyers acquired against OpenAI.

OpenAI claims it accidentally deleted the ChatGPT training data after publishers filed for copyright violations

OpenAI is advancing rapidly in developing AI for businesses but faces obstacles to achieving a major breakthrough, while Apple's cautious approach is keeping Apple Intelligence steady. Tech giants are not shy about using copyrighted material to train different AI models with different data sets. We have previously covered how AI companies not only used textual data but also YouTube videos, including MKBHD videos, to train their AI models.

Related Story Edge of Memories Composer Believes AI Is Useful Tool, But Slams It For Generating Art: “Art Cannot Exist Without Humanity”

OpenAI previously agreed to open its AI platform for The New York Times and Daily News in an attempt for them to search for their own copyrighted material in the AI training sets. The publishers' experts spent a hefty amount of time curating the data that OpenAI had used to train ChatGPT since early November. While evidence could have supported the publishers' claims, OpenAI accidentally erased relevant data sets that went into training ChatGPT.

Kyle Wiggers from TechCrunch states:

Earlier this fall, OpenAI agreed to provide two virtual machines so that counsel for The Times and Daily News could perform searches for their copyrighted content in its AI training sets…In a letter, attorneys for the publishers say that they and experts they hired have spent over 150 hours since November 1 searching OpenAI’s training data.

But on November 14, OpenAI engineers erased all the publishers’ search data stored on one of the virtual machines, according to the aforementioned letter, which was filed in the U.S. District Court for the Southern District of New York late Wednesday.

To put it simply, OpenAI is accused of deleting the evidence or research conducted by the experts from The New York Times. You can check out the letter published online for more details. OpenAI was able to retrieve the deleted data but in a format that can not be used legally, making it unsuitable in the case of copyrighted material. It remains to be seen how the publishers will respond to the mishap and if any additional measures in the pipeline could allow them to proceed with their claims.

As mentioned earlier, it remains to be seen how the legal teams pursue their case against OpenAI and possibly other tech giants for copyrighted material. We will keep you posted with the latest updates on the story, so be sure to stick around.

About the author: Ali Salman is a technology reporter for Wccftech mobile section with a specialized focus on Apple and the intellectual property that drives mobile innovation. He has cultivated a unique expertise in analyzing and deconstructing complex technology patents, translating dense legal and technical documents into clear, insightful reports on future products.

Follow Wccftech on Google to get more of our news coverage in your feeds.