The legal status of using copyrighted content to train large language models and other foundation models is the most actively contested question in AI copyright law. Plaintiffs in pending U.S. litigation argue that ingesting copyrighted works into a training corpus constitutes infringement not saved by fair use; defendants argue that training is a transformative use that does not substitute for the original market. The outcome of these cases — including matters involving The New York Times, book authors, image creators, and code repositories — will have profound implications for the licensing obligations of AI developers. Gurpreet Bal advises AI companies that, regardless of litigation outcomes, the contractual terms of data providers are independently binding: web scraping ToS restrictions, licensed data feed agreements, and dataset licenses from academic or commercial data vendors all impose use limitations that training for commercial AI models may violate. Data licensing agreements should explicitly cover training use, model output rights, sublicensing to downstream deployers, and what happens to the trained model if the data license is terminated — a question most existing data license templates do not address.
The AI model licensing landscape has fragmented into three broad structures, each with distinct legal implications for downstream use. Open source model licenses — including Apache 2.0 (used by many Llama-family models), the Llama community license, and the RAIL (Responsible AI License) variants — permit broad use and modification but often include use restrictions (prohibiting certain harmful applications), attribution requirements, and in the case of the Llama license, revenue-based thresholds above which commercial terms apply. Commercial model licenses (from providers like Anthropic, OpenAI via enterprise agreements, and Cohere) involve a more traditional SaaS or API-access structure, with output ownership provisions, use case approvals, data handling obligations, and indemnification terms that vary significantly across vendors. Gurpreet S. Bal advises enterprise customers that the model license terms should be reviewed in conjunction with the vendor's acceptable use policy and data processing agreement, because restrictions on permissible use cases embedded in the AUP are contractually enforceable even if they are not prominently featured in the commercial agreement. Proprietary fine-tuning and model customization provisions — particularly who owns the fine-tuned model weights — are a critical negotiation point in enterprise AI agreements.
The U.S. Copyright Office has consistently held that copyright protection requires human authorship, and that output generated autonomously by an AI system — without sufficient human creative control — is not eligible for copyright protection. This position, affirmed in the Thaler v. Perlmutter line of cases and the Copyright Office's 2023 guidance on AI-generated works, has significant commercial implications: a company that deploys AI to generate marketing content, code, product designs, or other deliverables may not be able to assert copyright in those outputs against competitors or infringers. Gurpreet Bal advises companies to document the human creative contribution to AI-assisted works — the selection and arrangement of prompts, the editorial judgment applied to outputs, the iterative refinement process — to support copyright claims where human authorship is genuine and substantial. In commercial agreements where AI-generated deliverables are licensed or sold, IP ownership and warranty provisions should be drafted to reflect this uncertainty, including representations about the human contribution to the work and indemnification carve-outs for copyright claims arising from AI output. Outside the U.S., the EU's approach under the AI Act and UK copyright reform discussions suggest that output ownership rules will continue to diverge across jurisdictions.
The EU AI Act, which entered force in August 2024 with a phased compliance timeline, imposes obligations on AI system providers and deployers that directly affect licensing structures. High-risk AI systems — those used in employment decisions, credit scoring, biometric identification, and critical infrastructure — must comply with conformity assessment requirements, technical documentation standards, and human oversight mandates before deployment in the EU market. Gurpreet S. Bal advises that AI licensing agreements for high-risk systems should clearly allocate compliance responsibilities between the AI provider (the entity placing the system on the market) and the deployer (the entity using it in a specific context), because the Act's obligations fall on both parties in different respects. Output indemnification — one AI vendor offering to defend and indemnify customers against IP infringement claims arising from AI-generated content — is an emerging feature of enterprise AI agreements (notably offered by Microsoft and Google in their Copilot products), but the scope, conditions, and caps on these indemnities vary substantially. Enterprise customers should scrutinize the conditions attached to AI indemnification promises, particularly the requirements to use only approved content filters, to avoid fine-tuning on potentially infringing data, and to report infringement claims promptly — conditions that, if not met, void the indemnity.
Gurpreet S. Bal is a Partner at Foley and Lardner LLP in Silicon Valley, where he advises technology companies on licensing, venture financings, M&A, and corporate transactions. He has represented clients in hundreds of transactions with aggregate deal value exceeding $60 billion across AI, semiconductors, fintech, and emerging technology.