Skip to content Skip to footer

OpenAI hit with additional lawsuits over copyright claims tied to ChatGPT training data

OpenAI and Microsoft face fresh legal challenges as The New York Times and eight other newspapers sue for unauthorized use of their content to train ChatGPT, escalating a significant debate over copyright in the tech world.

Short Summary:

  • The New York Times and eight daily newspapers accuse OpenAI and Microsoft of copyright infringement.
  • The lawsuits demand the destruction of AI models trained on copyrighted content.
  • This legal battle could reshape the future of AI and copyright law.

By Vaibhav Sharda, founder of Autoblogging.ai

In a landmark move, The New York Times and eight other major newspapers, including The Chicago Tribune and The Denver Post, have filed lawsuits against OpenAI and Microsoft. The stakes are high as the plaintiffs seek to clamp down on the unauthorized use of their copyrighted material to train ChatGPT, a globally renowned AI tool.

“This issue is not just a business problem for a handful of newspapers; it is a critical issue for civil life in America,” stated the plaintiff’s lawyers.

The lawsuits argue that ChatGPT used millions of articles without permission or compensation, a claim that, if proven, could result in OpenAI having to pay billions in damages. Furthermore, the plaintiffs are demanding that any AI models built using their content be destroyed. Legal experts, however, remain skeptical about the feasibility of this order, given the astronomical costs involved.

Gary Marcus, a professor at New York University, emphasized,

“Re-training an AI model could cost on the order of a hundred million dollars for earlier models, and a billion or even multiple billions for future models.”

This adds an additional layer of complexity and uncertainty to the case.

Regurgitation and Reputational Harm

An essential aspect of the lawsuit is how ChatGPT’s responses reportedly tarnish the reputations of the newspapers. For example, ChatGPT falsely attributed an endorsement of a dangerous infant lounger to The Chicago Tribune and fabricated research from The Denver Post claiming smoking cures asthma. Such incidents not only cast doubt on the reliability of AI-generated content but also pose existential risks to the credibility of journalistic institutions.

“It’s a good way of showing that what the model has learned is not at the abstract level, but that the model has actually specifically memorized a lot of text,” remarked Matthew Sag, a professor at Emory University.

The plaintiffs argue that the AI models reproduce sizeable portions of their articles verbatim, significantly departing from “fair use” norms, which typically allow for limited copying for transformative purposes. OpenAI, on the other hand, insists that their use of publicly available data from the web is permissible under the fair use doctrine.

Amidst these claims, OpenAI’s spokesperson commented,

“Along with our news partners, we see immense potential for AI tools like ChatGPT to deepen publishers’ relationships with readers and enhance the news experience.”

Ironically, companies like German publisher Axel Springer and The Associated Press have opted to license their content to OpenAI, suggesting that mutually beneficial agreements are possible.

A Legal Quagmire

OpenAI’s challenges encompass a larger legal and ethical conundrum faced by the AI industry. The crux of the case lies in determining whether AI training using copyrighted material falls under “fair use.” Legal experts argue that this issue is unresolved and murky, likely requiring prolonged court battles and appeals to clarify.

“OpenAI’s copying threatens the subscription revenue of The New York Times,” highlighted the lawsuit.

The outcome holds profound implications not just for AI but intellectual property law as a whole. News publishers feel that their copyrighted content is being used without permission to create AI models that compete with them, undermining their business models.

Kathy Kleiman, a visiting professor at Georgetown Law, noted,

“The financial incentive to resolve these cases is staggering. News publishers have limited resources, but the AI companies have deep pockets.”

The high stakes have compelled both sides to dig in, making collaborative solutions appear elusive.

Adding to the drama, other sectors have also voiced grievances. Sarah Silverman, a noted actress and author, joined a lawsuit accusing OpenAI of using her memoir without permission. Similarly, multiple novelists and programmers have claimed that their work was utilized to train AI models without their consent.

The Fair Use Defense

One of the more contentious elements involves OpenAI’s insistence on the “fair use” defense. This doctrine was originally designed to enable limited copying for uses such as criticism, commentary, or education. However, the application of this doctrine to AI models, which can generate human-like text and images, complicates matters.

Kristelia García, a copyright law professor at Georgetown University, said, “They’re saying it’s spitting out exactly what was put into it.”

This lawsuit, and others like it, challenge the boundaries of AI technology and “fair use.” If the courts decide against OpenAI, it would set a precedent likely to influence future AI development and the broader tech industry.

Notably, this issue isn’t confined to the U.S. In late 2022, a group of European publishers, including German giant Axel Springer, raised similar concerns. Some see this as a pivotal moment in AI ethics, underscoring the need for responsible and respectful AI development practices.

For those of us at Autoblogging.ai, these ongoing legal battles highlight the importance of building ethical AI tools. As an AI Article Writer platform, we continuously strive to ensure that our models respect copyright laws and offer added value to content creators.

Moving Forward

Whether or not the lawsuits succeed, they shed light on the evolving relationship between AI and traditional media. Some exponents argue that AI can democratize information and journalism by making quality content more accessible.

“Fair use allows researchers, teachers, critics, and others to rely on copyrighted works without permission and payment,” pointed out Peter DiCola, another noted legal scholar.

OpenAI’s stance remains that its generative models offer transformative value. The company contends that the models do not compete directly with original sources but instead serve different, productive purposes. Nevertheless, skepticism remains high among content owners who see AI as a threat rather than an opportunity.

As Gary Marcus has highlighted, filtering out copyrighted material poses considerable challenges. According to Marcus, even if URLs with copyrighted stories are removed, other versions of the same stories are often readily available on multiple platforms like Reddit. This complexity is emblematic of the wider difficulties faced by AI companies in ensuring compliance.

The court battles could, ironically, drive innovation in ethical AI. Faced with lawsuits and the possibility of steep penalties, companies like OpenAI might turn to more robust and transparent systems for content usage. In this light, the legal hurdles could serve as catalysts for developing better, more respectful AI technologies.

“AI’s potential for transforming multiple industries is enormous, and responsible innovation is key,” stressed Cecilia Ziniti, a technology attorney.

Ultimately, the controversy underscores the need for clear, universally accepted guidelines on AI and copyright. As AI continues to evolve, it will be crucial to find balanced solutions that protect intellectual property while allowing innovation to flourish.

At Autoblogging.ai, we believe in the transformative power of AI but recognize that ethical considerations must be at the forefront. Platforms like ours, which delve into the Artificial Intelligence for Writing, are shaping the Future of AI Writing. Transparency, compliance, and respect for creators are essential for this journey.

As the legal showdown between The New York Times and OpenAI unfolds, it will likely serve as a vital case study in the AI Ethics. The lessons learned here will undoubtedly inform the next stage of AI advancements, highlighting both opportunities and obstacles in blending human creativity with machine capabilities.

For the tech industry and legal scholars, this saga serves as a reminder: Innovation must be combined with responsibility, ensuring a balanced and ethical future for AI development.