by Traverse Legal, reviewed by Enrico Schaefer - April 11, 2025 - Artificial Intelligence, Copyright Law, Fair Use
Generative AI depends on data—and lots of it. From search engines and large language models to image generators and music tools, these systems are trained on massive datasets, often pulled directly from the public web. Think news articles, social media posts, lyrics, books, stock images, etc.
But here’s the catch: just because content is publicly viewable doesn’t mean it’s legally free to use.
That assumption—that anything online is fair game for scraping and training—has fueled the rapid expansion of AI innovation. Now, it’s facing serious legal pushback. Copyright owners across industries are fighting back, and the courts are starting to weigh in. The most notable signal came in early 2025, when a federal judge in Thomson Reuters v. Ross Intelligence rejected a fair use defense in an AI training context. It marked a clear warning: copyright law still applies in the age of AI.
While these court cases will no doubt go on for years, the implications for companies using AI tools, companies building AI tools, and everyone in between are at risk right now. Upstream and downstream players must know the uncertainty and potential for getting sued or receiving a threat letter. The companies building AI models are at the highest risk of getting sued. As a practical matter, however, most of the development work in artificial intelligence involves building tools on top of existing AI models using an API. And yes, companies building AI tools connected to a model through the model’s API can get sued for copyright infringement. While end users of AI tools are less likely to get sued, they also have legal liability under established copyright law.
In today’s AI-driven economy, innovation is moving faster than the law, and that gap is closing quickly. As courts scrutinize how generative models are trained, a new legal frontier is shaping around copyright, fair use, and content ownership. This article unpacks the emerging risks of scraping publicly accessible content, highlights pivotal court rulings reshaping the rules, and offers practical insight into what AI companies, content creators, and investors must do now to adapt. The compliance window is narrowing, and those who act early will be best positioned to lead.
At the heart of this legal fight is how AI models “learn.” To function, generative AI systems must process vast volumes of content. During training, models ingest and copy content—often word-for-word or pixel-for-pixel—to identify patterns, relationships, and structures.
That act of copying, even if not shown directly in the AI’s final outputs, is where the copyright issues begin. For copyright holders, it’s not just about what the model produces but how it was trained. Copying entire books, articles, or image libraries without a license raises a serious question: is this a transformative, fair use of content, or simply unlicensed reproduction for commercial gain?
As more plaintiffs allege infringement, the risk to AI companies is no longer theoretical. It’s immediate, it’s growing, and it’s moving toward the courtroom.
Many AI companies lean on the doctrine of “fair use” as a shield. But that protection is more nuanced—and more fragile—than many assume.
Courts use a four-factor test to determine whether copying qualifies as fair use:
Apply this framework to AI, and things get murky. Is training an LLM on entire novels “transformative”? Does using thousands of stock photos to teach an image model reduce licensing demand? The answers vary—and courts are just beginning to grapple with these questions.
What’s clear is that “publicly available” is not synonymous with “public domain.” Just because content is out in the open doesn’t mean it’s fair game for commercial AI training. And now, with high-stakes lawsuits gaining traction, that line is being tested—and redrawn—in real time.
In one of the first major rulings to directly address copyright and AI training, the court in Thomson Reuters v. Ross Intelligence delivered a decisive blow to the “fair use” argument many AI companies rely on. Ross, a legal tech startup, had used Westlaw’s proprietary headnotes—concise, editor-authored legal summaries—to train its AI-based legal research assistant.
Thomson Reuters, the parent company of Westlaw, sued for copyright infringement. Ross argued that its use of the headnotes was transformative and should be protected under fair use. But the court disagreed. It found that Ross had directly copied the content, that the headnotes reflected significant editorial judgment, and that the copying served a commercial purpose. As a result, Ross’s fair use defense failed—and the court held that the company had infringed Thomson Reuters’ copyrights.
This case represents a critical inflection point in the legal treatment of generative AI. It reinforced a key message: publicly visible content—even in a professional or technical context—is not automatically free for training use. The court was especially concerned with the originality and editorial value of the headnotes, as well as the competitive harm to Westlaw’s market position.
In other words, if your AI model is trained on curated, human-authored content—particularly when that content reflects intellectual labor and is commercially licensed—you may be stepping into copyright infringement territory.
The implications of this ruling stretch far beyond the legal tech world. For startups building AI products, and for investors backing them, the Ross case should trigger a strategic reassessment of how training data is sourced, documented, and licensed.
This ruling may become a persuasive precedent for other courts evaluating the same question across different industries—whether it’s music, journalism, education, or entertainment. It signals a shift: the era of unchecked web scraping is giving way to an age of accountability.
Going forward, AI companies should assume that if the training data has commercial value and was created with human authorship, copyright protections likely apply—and ignoring them is no longer a viable business strategy.
Across every major content category—news, books, images, music—copyright holders are filing lawsuits that could fundamentally alter how generative AI is built. These cases aren’t just about past misuse. They’re setting the legal blueprint for what’s permissible going forward—and where the lines will be drawn.
As courts begin to weigh in on how copyright law applies to AI, several critical insights are emerging. These takeaways are essential for AI developers, product leaders, content owners, and investors looking to navigate the evolving legal and business environment:
Just because content is online doesn’t mean it’s free to use. Courts are drawing a clear distinction between access and ownership. Public visibility does not eliminate copyright protection, especially for content that reflects original, human-authored effort.
Companies relying on the assumption that “public” equals “permissible” are increasingly facing legal scrutiny—and losing that argument in court.
The use of unlicensed content in training datasets carries tangible legal exposure. From copyright infringement claims to potential injunctions or forced model retraining, the risks are no longer hypothetical.
AI companies—especially those whose products are built on large-scale scraping—need to evaluate the sustainability and legality of their data practices. IP compliance is becoming a material concern in investment, acquisition, and regulatory conversations.
Recent rulings, such as in Thomson Reuters v. Ross Intelligence, suggest a shifting balance of power toward content owners. Courts are recognizing the value of editorial judgment, creative effort, and the commercial markets for original works.
As lawsuits gain traction, content creators across industries—news, publishing, music, and beyond—are finding new leverage to enforce their rights. The future may favor frameworks that include licensing, attribution, and negotiated data access agreements.
The AI industry is entering a new phase—one where copyright compliance and content rights are becoming part of the business model, not an afterthought.
As legal challenges mount, AI companies are likely to face increasing pressure to license the content they rely on. That means negotiating with publishers, authors, image libraries, and other rights holders. While this shift may increase development costs, it also opens the door to new commercial models—where content is used with permission, attribution, and compensation.
Strategic partnerships could replace adversarial lawsuits, but only if companies proactively rethink how they source and use data.
Legal uncertainty may create short-term friction, especially for smaller startups or open-source models. But in the long run, building with compliance in mind is the only sustainable path forward. Systems that respect intellectual property will be more defensible in court, more attractive to partners, and better positioned to scale.
AI developers should begin preparing for a future where rights management and data governance are core infrastructure—not optional add-ons.
Legislators are starting to weigh in. From proposed regulations on data transparency to hearings on fair use and AI accountability, public policy is beginning to shape the rules of the road. At the same time, industry-led standards may emerge—defining best practices for how generative models interact with protected content.
The companies that thrive in this environment will be the ones that take the lead, not the ones waiting for court orders or regulatory mandates.
For developers, founders, and tech companies building generative models, the first and most urgent step is to gain visibility into the training data that powers their systems. That means auditing not only what’s inside the models today, but where that content came from and whether its use carries legal risk. Relying on scraped or aggregated datasets without understanding their origins is no longer a defensible position—and the courts are making that clear.
Looking ahead, companies should proactively explore licensing models for high-value content types such as journalism, books, images, and music—especially in areas where copyright holders are already taking legal action. In parallel, it’s critical to stay on top of the fast-moving legal landscape. Court decisions over the next 12 to 24 months will likely reshape the boundaries of fair use and copyright infringement in the AI space, with significant implications for both risk and innovation.
Equally important is the legal support behind the technology. Partnering with law firms that understand not only intellectual property law but also the underlying AI systems is essential. At Traverse Legal, we work directly with forward-thinking companies to help them build AI tools that are both innovative and compliant—designing risk-aware strategies that won’t unravel when the next lawsuit hits.
On the other side of the equation, content creators and publishers need to approach this moment with both vigilance and intent. Generative AI tools are already ingesting and reproducing their work, often without permission or attribution. Understanding how content is being used—and how it might be embedded in commercial AI tools—is a necessary first step in asserting control. As legal frameworks continue to evolve, some perspectives suggest
that content owners are beginning to shape the conversation around copyright and generative AI, including how licensing models and enforcement mechanisms are being reconsidered in this context.
For some, enforcement will make sense. That may mean pursuing individual legal action, joining collective suits, or simply putting platforms on notice. But there’s also a parallel opportunity: to help shape the next generation of content licensing. Creators don’t need to sit on the sidelines while others profit from their work. They can and should evaluate licensing frameworks that both protect their intellectual property and support responsible AI development.
Ultimately, whether you’re building the next breakthrough AI product or protecting the creative output that fuels it, your legal strategy must evolve as quickly as the technology itself. Navigating that tension—between innovation and protection—is where strategic counsel makes the difference. Traverse Legal stands at that intersection, helping both sides move forward with clarity and confidence.
The era of assuming that publicly available content is free for the taking is coming to an end. Courts are beginning to draw firmer lines between visibility and ownership, particularly when it comes to how generative AI models are trained. The once-blurry boundaries around fair use, scraping, and derivative outputs are becoming sharper—and the legal risks harder to ignore.
This shift has real consequences. AI companies that built on unlicensed content must now rethink their data practices. Creators and publishers, once sidelined in the AI conversation, are reclaiming control over their work. And investors and product leaders are quickly learning that legal compliance isn’t a roadblock to innovation—it’s part of building something that lasts.
The rules are changing. The stakes are rising. And the businesses that succeed in this next chapter will be the ones that treat copyright, licensing, and content rights not as legal afterthoughts—but as strategic foundations.
At Traverse Legal, we help innovators, platforms, and content creators navigate this evolving landscape. Whether you’re building next-gen AI or protecting the content it learns from, we partner with you to align your legal strategy with your business goals—so you can move forward with clarity, compliance, and confidence.
As a founding partner of Traverse Legal, PLC, he has more than thirty years of experience as an attorney for both established companies and emerging start-ups. His extensive experience includes navigating technology law matters and complex litigation throughout the United States.
This page has been written, edited, and reviewed by a team of legal writers following our comprehensive editorial guidelines. This page was approved by attorney Enrico Schaefer, who has more than 20 years of legal experience as a practicing Business, IP, and Technology Law litigation attorney.