AI Technology Licensing: Data, Models, and Output Rights | Gurpreet S. Bal, Silicon Valley Technology Attorney

AI licensing sits at the intersection of unresolved copyright doctrine, rapidly evolving commercial practice, and emerging regulatory frameworks — a combination that makes it one of the most dynamic and legally uncertain areas of technology transactions. Gurpreet S. Bal, a Partner at Foley and Lardner LLP in Silicon Valley, advises AI developers, enterprise licensees, and investors on structuring agreements that address the full lifecycle of AI technology: the data that trains it, the model that embodies it, the outputs it generates, and the compliance obligations that attach when it is deployed in regulated contexts. As litigation over training data and legislative activity around AI accountability continue to develop, the contractual frameworks governing AI licensing are evolving in real time.

What copyright, fair use, and data provider terms govern AI training data licensing?

Whether using copyrighted content to train models is infringement or transformative fair use is the most actively contested question in AI copyright law, and pending U.S. cases will shape developers' licensing obligations. Regardless of those outcomes, data providers' contractual terms are independently binding, so scraping ToS restrictions, licensed data feeds, and dataset licenses all impose use limitations that commercial training may violate. Data licensing agreements should explicitly cover training use, output rights, sublicensing to downstream deployers, and what happens to the trained model if the data license is terminated.

The legal status of using copyrighted content to train large language models and other foundation models is the most actively contested question in AI copyright law. Plaintiffs in pending U.S. litigation argue that ingesting copyrighted works into a training corpus constitutes infringement not saved by fair use; defendants argue that training is a transformative use that does not substitute for the original market. The outcome of these cases — including matters involving The New York Times, book authors, image creators, and code repositories — will have profound implications for the licensing obligations of AI developers. Gurpreet Bal advises AI companies that, regardless of litigation outcomes, the contractual terms of data providers are independently binding: web scraping ToS restrictions, licensed data feed agreements, and dataset licenses from academic or commercial data vendors all impose use limitations that training for commercial AI models may violate. Data licensing agreements should explicitly cover training use, model output rights, sublicensing to downstream deployers, and what happens to the trained model if the data license is terminated — a question most existing data license templates do not address.

How do open source, commercial, and proprietary API AI model licenses differ?

Model licensing has fragmented into three structures with distinct implications. Open source model licenses permit broad use and modification but often add use restrictions, attribution requirements, and revenue-based thresholds above which commercial terms apply. Commercial licenses follow a more traditional SaaS or API-access structure with output ownership, use-case approvals, and indemnification that vary across vendors, so model terms should be reviewed alongside the acceptable use policy and data processing agreement because AUP use-case restrictions are enforceable even when not prominent. Who owns fine-tuned model weights is a critical negotiation point in enterprise AI agreements.

The AI model licensing landscape has fragmented into three broad structures, each with distinct legal implications for downstream use. Open source model licenses — including Apache 2.0 (used by many Llama-family models), the Llama community license, and the RAIL (Responsible AI License) variants — permit broad use and modification but often include use restrictions (prohibiting certain harmful applications), attribution requirements, and in the case of the Llama license, revenue-based thresholds above which commercial terms apply. Commercial model licenses (from providers like Anthropic, OpenAI via enterprise agreements, and Cohere) involve a more traditional SaaS or API-access structure, with output ownership provisions, use case approvals, data handling obligations, and indemnification terms that vary significantly across vendors. Gurpreet S. Bal advises enterprise customers that the model license terms should be reviewed in conjunction with the vendor's acceptable use policy and data processing agreement, because restrictions on permissible use cases embedded in the AUP are contractually enforceable even if they are not prominently featured in the commercial agreement. Proprietary fine-tuning and model customization provisions — particularly who owns the fine-tuned model weights — are a critical negotiation point in enterprise AI agreements.

Who owns AI-generated output given the human authorship requirement?

The U.S. Copyright Office holds that copyright requires human authorship, so output generated autonomously by AI without sufficient human creative control is not protectable, which means a company may not be able to assert copyright in AI-generated deliverables against competitors or infringers. To support claims where authorship is genuine, companies should document the human creative contribution, including prompt selection, editorial judgment, and iterative refinement. In commercial agreements, IP ownership and warranty provisions should reflect this uncertainty with representations about human contribution and indemnification carve-outs, and output ownership rules continue to diverge across jurisdictions.

The U.S. Copyright Office has consistently held that copyright protection requires human authorship, and that output generated autonomously by an AI system — without sufficient human creative control — is not eligible for copyright protection. This position, affirmed in the Thaler v. Perlmutter line of cases and the Copyright Office's 2023 guidance on AI-generated works, has significant commercial implications: a company that deploys AI to generate marketing content, code, product designs, or other deliverables may not be able to assert copyright in those outputs against competitors or infringers. Gurpreet Bal advises companies to document the human creative contribution to AI-assisted works — the selection and arrangement of prompts, the editorial judgment applied to outputs, the iterative refinement process — to support copyright claims where human authorship is genuine and substantial. In commercial agreements where AI-generated deliverables are licensed or sold, IP ownership and warranty provisions should be drafted to reflect this uncertainty, including representations about the human contribution to the work and indemnification carve-outs for copyright claims arising from AI output. Outside the U.S., the EU's approach under the AI Act and UK copyright reform discussions suggest that output ownership rules will continue to diverge across jurisdictions.

What EU AI Act compliance and output indemnification requirements are in AI commercial contracts?

The EU AI Act imposes obligations on AI providers and deployers that directly affect licensing, with high-risk systems facing conformity assessment, documentation, and human oversight requirements before EU deployment. Licensing agreements for high-risk systems should clearly allocate compliance responsibilities between provider and deployer, since the Act's obligations fall on both in different respects. Output indemnification, where a vendor defends customers against IP claims arising from AI-generated content, is an emerging feature, but its scope, conditions, and caps vary, so customers should scrutinize the attached conditions that can void the indemnity if unmet.

The EU AI Act, which entered force in August 2024 with a phased compliance timeline, imposes obligations on AI system providers and deployers that directly affect licensing structures. High-risk AI systems — those used in employment decisions, credit scoring, biometric identification, and critical infrastructure — must comply with conformity assessment requirements, technical documentation standards, and human oversight mandates before deployment in the EU market. Gurpreet S. Bal advises that AI licensing agreements for high-risk systems should clearly allocate compliance responsibilities between the AI provider (the entity placing the system on the market) and the deployer (the entity using it in a specific context), because the Act's obligations fall on both parties in different respects. Output indemnification — one AI vendor offering to defend and indemnify customers against IP infringement claims arising from AI-generated content — is an emerging feature of enterprise AI agreements (notably offered by Microsoft and Google in their Copilot products), but the scope, conditions, and caps on these indemnities vary substantially. Enterprise customers should scrutinize the conditions attached to AI indemnification promises, particularly the requirements to use only approved content filters, to avoid fine-tuning on potentially infringing data, and to report infringement claims promptly — conditions that, if not met, void the indemnity.

Gurpreet S. Bal is a Partner at Foley and Lardner LLP in Silicon Valley, where he advises technology companies on licensing, venture financings, M&A, and corporate transactions. He has represented clients in hundreds of transactions with aggregate deal value exceeding $60 billion across AI, semiconductors, fintech, and emerging technology.