* Shourya Shekhar

(Source:Business Standard)
As generative AI models rapidly evolve, they increasingly depend on large-scale text and data inputs, often drawn from copyrighted content. Yet, Indian copyright law offers no explicit exception for such use, leaving developers in a legal grey zone. This article explores the doctrinal gap in Sections 14, 51, and 52 of the Copyright Act, engages with key Indian and foreign case law, and situates the debate within international policy trends. Against the backdrop of the ANI v. OpenAI litigation, it critically examines whether India needs a statutory TDM exception to balance innovation, fair use, and authorial rights in the AI age.
Introduction
India’s ever-expanding AI sector faces a fundamental copyright question at its current juncture. Can machine learning firms legally scrape copyrighted text and images en masse to train generative models? As stated in the Indian Copyright Act, 1957, authors possess exclusive rights to copy, distribute and communicate their works. Emphasis must be placed on Section 14, which grants the author the sole right “to reproduce the work in any material form including storing of it by electronic means”. In cohesion with the same, Section 51 imposes strict liability for infringing these rights. It is here that we find the core of our issue, India’s law has no specific carve-out for automated text and data mining (TDM) or AI training. The only clear exemption that exists in the current Act is the exception of fair dealing under Section 52, which permits copying for private research or criticism/review. Unfortunately, fair dealing in India has been interpreted in a narrow sense, with courts emphasizing the enumerated purposes. No doctrine exists that allows wholesale copying of copyrighted works to be used for AI analysis. To put it briefly, training an AI model on a library of copyrighted books or images would infringe the author’s exclusive reproduction right, unless proper permission or a valid license has been provided.
This article aims to examine this persisting legal gap. We aim to show that India’s copyright framework was written for human readers, not algorithms and lacks a safety net for large-scale TDM. Indian courts have been vigilant in protecting authors’ rights and recent domestic cases such as ANI Media v. OpenAI, which is currently sub-judice before the Delhi High Court bring forth new questions and test the Act’s boundaries. By contrast, jurisdictions worldwide have been upfront in recognizing TDM as a distinct use that merits its own exception. India faces a choice in its legislation, either force AI developers to painstakingly license or avoid Indian content, or to provide further clarity regarding when machine learning is allowed. We argue that India needs a targeted TDM exception to ensure it doesn’t inhibit innovation while preserving authors’ interests.
Indian Copyright Law and Fair Dealing
The Copyright Act, 1957 consists of a comprehensive scheme of exclusive rights and limited exceptions. Section 14(a)(i) of the Act confers upon the author the exclusive right “to reproduce the work in any material including the storing of it in any medium by electronic means”[1]. To put it simply, the rights to make any copy of a book, article, image or any copyrighted content, even if only for the purpose of storing and processing it online, belongs only to the rights-holder. Section 51 further goes on to define infringement broadly; as it constitutes indulging in any act reserved to the copyright owner, such as reproducing without permission[2]. Under this definition, an AI company which downloads millions of news articles or photos, essentially commits acts of reproduction and communication to the public on an unfathomable scale which is in direct violation of Sections 14 and 51, unless an exception applies.
The Act only provides the general defense of the Fair Dealing exception in Section 52. Clause 52(1)(a) serves for exemption under “fair dealing” with a literary work for purposes of private or personal use including research, or for criticism or review[3]. The statute provides relief in explicitly mentioning that even storing a work electronically for such permissible uses does not constitute infringement. However, this exemption is extremely limited. “Private use” can only be interpreted as a single individual’s consumption, not a company’s internal data-mining for product development. “Research” is usually understood as academic or personal study, not commercial innovation. Indian courts have previously interpreted these exceptions narrowly as well. In Civic Chandran v. Ammini Amma[4], the Kerala High Court allowed substantial copying only due to the second play being interpreted as a criticism of the first, emphasizing that the purpose fell within the statutory scope. Indian courts have also previously shed light on protecting authors’ exclusive rights vigorously through cases such as Eastern Book Co. v. D.B. Modak[5] and Super Cassettes v. MySpace[6], but the focus lies in applying the Act’s text rather than inventing open-ended fair-use rules. Unlike U.S. law[7], India’s statute does not provide judges with a free-form fair-use standard; it is a “fair dealing” model with fixed categories. Thus, the absence of a “transformative use” exception stems largely from statutory design, not just judicial restraint.An AI training process is usually commercial and internal, not falling within the purview of the exemptions provided under Indian jurisdictions.
Beyond the scope of Clause 52(1)(a), the other subclauses cover only very specific contexts. Incidental technological reproductions, educational copying, legislative purposes, reporting of current events and judicial proceedings are some of the few uses of copyrighted data that find relief within the Act. None of these clauses anticipate modern data mining. Even the exemption provided under Clause 52(1)(b)[8] only protects automatic caching in the process of lawful online transmission. There exists no blanket exception for making non-expressive copies of a work for analysis. When put into practice, this results in no Indian company or researcher being able to scrap a news website or digitize a library, as no exceptions exist to shield use for such purposes. In India’s first-ever generative-AI copyright infringement suit, that is ANI Media v. OpenAI[9], it can already be seen that a narrow fair-dealing list raises concerns about its usage to cover AI training. As it stands, comprehensive scanning of copyrighted texts or images by an AI developer is likely to be treated as unauthorized copying unless reinterpreted expansively.
Global TDM Exceptions: EU, US, UK
The issue of TDM exceptions has garnered attention from jurisdictions all over the globe in recent years. The EU Copyright Directive (2019)[10] is one such step by the European Union to ensure ease of access for text and data mining operations of lawfully accessible works. Article 3 of the directive mandates that research organizations may mine any lawfully accessed content for scientific research, with no rights-holder opt-out. Article 4 further goes on to permit any person, including commercial entities to reproduce and extract lawfully accessible works for TDM, unless the author has expressly reserved their rights[11]. The reservation of rights can even be done through machine-readable notices. This has allowed AI developers in the EU to mine any content to which they have legal access, whether through purchasing or subscribing to the required content or using any openly accessible sources. Creators also possess the right to forbid such mining with the same being conveyed through clear opt-out notices. The Copyright Directive explicitly allows for making and retaining the temporary copies needed for analysis, thus balancing innovation and authors’ control.
Similar precedent can be found in the United Kingdom’s legislation. Although the Copyright Act of the UK only provides a narrow exception under Section 29A[12], which allows for only non-commercial text mining for research purposes, the government has taken major leaps in the field since late 2024. Through consultation papers open from December, 2024 to February, 2025, the UK government has brought forth suggestions regarding legalizing AI training on copyrighted works when lawfully accessed with an opt-out mechanism[13]. The papers put forth suggestions for allowing any user with access to the required content to mine it while protecting the control of the authors by providing them the framework to reserve their works or demand licenses or payment.
The United States has been proactive in their approach to AI training as well. Although no specific TDM statutes exist in the US jurisdiction, courts have adopted a flexible fair-use approach to target the issue. In Authors Guild v. Google[14], about 20 million books had been found to be scanned by Google to build a searchable database. The Second Circuit unanimously held this to be fair use; with the process of copying being labelled as “highly transformative” to create new search functions and only short snippets of the work being displayed. The court further deemed Google’s commercial motive to be irrelevant in the absence of any meaningful market substitution. Similarly, in Authors Guild v. HathiTrust[15], the act of creating a massive full-text digital library for search and accessibility was upheld as fair use. The court once again, specified that the searchable database was “a quintessentially transformative use”. These decisions further align with the hypothesis that data-mining for analysis which produces no human-readable copy can be non-infringing so long as it serves a new purpose and doesn’t replace sales.
AI Training Disputes and the OpenAI case
These principles do not exist as just statutory safeguards anymore, they are being tested by courts in legal disputes all over the world. London is currently witness to the case of Getty Images v. Stability AI[16]. Getty has alleged that Stable Diffusion was trained on “millions of images scraped from Getty’s websites without its consent”. Getty’s leadership has held to the stance that AI companies should ‘opt-in’ for permission and payment instead of relying on broad exceptions. In effect, Getty has asked the Honourable court to hold that mass copying of its copyrighted content is infringement without a license. With similar suits being filed by Getty and other photographers in the U.S., a global push requiring licensing for AI training data seems to be the need of the hour.
India entered this arena of jurisprudence in April 2025 when news agency Asian News International (ANI) filed a suit against OpenAI in the Delhi High Court[17]. ANI has submitted that ChatGPT illegally scraped its free and paywalled news articles to train its model. In its defense, OpenAI has claimed that the model only learns abstract “tokens” and patterns from text, akin to a human reading books for knowledge and does not republish the original content. OpenAI further submitted that after training, the AI contains no verbatim copy of ANI’s articles.
To further aid in the proceedings, the Delhi High Court went on to appoint two copyright experts, who have expressed opposing views on the matter. One of the experts has argued that merely copying and storing electronic texts for model training falls under the permissible “storage” exception under Section 52(1)(a), so long as nothing expressive is made public. While the latter has contended that OpenAI’s unlicensed copying is infringing because it is a commercial use not covered by the fair dealing exception. These split opinions further underscore the need for a new outlook on the matter as Indian law keeps being pressed into new territory.
The issue has further spread across to industry groups which have a stake in the matter. The Federation of Indian Publishers recently warned that AI’s unlicensed mining of literary works diminishes the economic value of literary works and endangers the publishing industry[18]. News publishers echo the same concerns, arguing that generative-AI models extract and repurpose journalism without consent, eroding the credibility and financial sustainability of news. Content owners have not minced their words in demanding tighter limits and compensation if their works are used and it has bore fruit with the government signaling attention as well. In February 2025, the Ministry of Electronics & IT informed the Parliament that unauthorized web scraping of publicly available data falls under the IT Act’s ban on data extraction[19]. The Digital Personal Data Protection Act (2023)[20] provides similar safeguards by requiring clear consent for processing personal data, which could affect large-scale scraping. The need for further regulation was recognized on April 28, 2025, with the DPIIT announcing a committee to examine AI’s impact on copyright law, its first meeting being scheduled for May 16, 2025[21]. This review coinciding with the ANI case’s hearing along with international AI policy discussions seeing Big Tech argue that strict copyright rules could stifle innovation emphasizes the need for rapid change. Courts could soon find themselves in the midst of a plethora of cases clashing between rigid statues and unprecedented technological changes.
Why India Needs A Clear TDM Exception
The current legal ambiguity on the subject matter hurts both innovators and rights-holders. Indian AI developers and researchers face the struggle of a copyright minefield; licensing millions of books, articles and images for training is practically impossible, yet continuing to mine data risks infringement suits. If Indian startups pull back or train their models abroad under clearer jurisdictions, missing out on more and more innovation and investment lies in the future of Indian tech. Conversely, creators fear their work is being cannibalized without payment. Various scenarios exist where AI tools generate unlicensed summaries or analysis of their content which undermines sales.
India’s fair dealing exception fails to provide an easy answer. Its scope is limited to specified purposes which include private research, criticism and review, news reporting etc. Copying thousands of works into a machine-learning pipeline for a commercial product does not clearly fit any of these exceptions. The ANI hearing has seen experts flatly note that OpenAI’s use of such data doesn’t fall under the purview of the fair dealing exception, implying India’s current doctrine won’t protect model training. Stretching the current interpretation of Section 52 to cover industrial-scale AI mining would be an unprecedented expansion of the law.
By contrast, a carefully tailored statutory exception would provide much needed clarity and balance. India could introduce a specific TDM safe harbour, for example through an amendment in Section 52. Such an exception may provide the right to make temporary digital reproductions needed for automated text or data analysis of that work to any person with lawful access to the same. Provided that the process being conducted is for the sole reason to improve or train AI models. Crucial limits that must be applied for efficient functioning of such an exemption would include:
Lawful Access: Only content legitimately obtained can be mined, excluding stolen or illicitly scraped data.
Non-Expressive Use: The mining of data must be for internal analysis, not for creating readable copies. No substantial verbatim excerpt or expressive content of the original may be communicated to the users.
Temporary Copies: Any reproductions made for TDM must be transient and deleted after copying
Rights-Holder Opt-Out: Creators keep control by signalling reservation of rights via an explicit notice or machine-readable metadata. If such an opt-out mechanism is not put into place, mining of the content shall be permitted.
Optional Remuneration: Rights-holders should remain to negotiate licenses or fees for commercial AI use of their works, ensuring they are paid accordingly for the process being conducted.
This exception would put India on a steady road towards mirroring international practice. The EU’s Article 4 and the UK’s suggested proposals both inculcate an opt-out framework. Recognizing that a statistical model or dataset is not the same as a copy meant for reading is imperative. US and EU courts have recently emphasized that TDM outputs, including databases and models are transformative and non-expressive, serving new public benefits. India’s benefit lies in similarly treating automated mining as a purely analytical research process.
However, rights-holders have not been in shy in raising counter-arguments. Any legislation on the matter must proceed with caution as an opt-out regime shifts the burden onto creators and may be ineffective without clear rules. Publishers around the globe have argued that a broad exception could possibly undercut the incentives to invest in content and its preparation for mining. It has been further pointed out that if the exception was applied retrospectively, and covered already collected datasets, authors would simply lose control over past-uses. Any new law must balance these concerns, which could be done through clarifying how opt-outs can be registered and ensuring adequate notice or compensation for the affected parties.
Legislating a TDM exception would not only fast-track India’s growth in the AI sector, but also align with the Copyright Act’s purpose to “promote the progress of science and useful arts”. It would enable Indian AI researchers to innovate using local language data and cultural works, while giving authors a clear tool to refuse or license such uses. Crucially, it would not abolish copyright or destroy incentives, as rights-holders could simply opt-out or demand payment, but would be free from the burden of litigating every use. By contrast, absence of an exception means costly uncertainty.
Conclusion
As things stand at the current moment, India’s Copyright Act provides no explicit safe harbour for AI model training on copyrighted materials. Courts find themselves treading muddy waters, being forced to navigate an untested copyright landscape, as seen in the OpenAI case. Meanwhile, the global landscape continues to shift with explicit TDM exceptions with safeguards being implemented in various jurisdictions. India’s lawmakers ought to walk this very path. Amending the Act to allow lawful, limited text and data mining would balance innovation with rights and fulfill the statute’s goal of promoting creative progress. This nuanced fix would not only serve as a boost to Indian innovators’ confidence to further build in AI, but would also preserve the authors’ control and their right to earn through their work under fair terms.
[1] The Copyright Act 1957, s 14(1)(a)(i)
[2] The Copyright Act 1957, s 51
[3] The Copyright Act 1957, s 52(1)(a)
[4] Civic Chandran v Ammini Amma 1996 KHC 836 (Ker).
[5] Eastern Book Company v DB Modak (2008) 1 SCC 1
[6] Super Cassettes Industries Ltd v MySpace Inc (2016) 65 PTC 1 (Del)
[7] Copyright Act of 1976, 17 USC § 107 (US)
[8] The Copyright Act 1957, s 52(1)(b)
[9] Asian News International v OpenAI (Delhi High Court, 2025), WP(C) No. 4456/2025.
[10] Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market [2019] OJ L130/92, art 3.
[11] ibid, art 4.
[12] Copyright, Designs and Patents Act 1988, s 29A (UK)
[13] UK Intellectual Property Office, Consultation on Copyright and Artificial Intelligence (December 2024).
[14] Authors Guild v Google Inc 804 F 3d 202 (2d Cir 2015)
[15]Authors Guild v HathiTrust 755 F 3d 87 (2d Cir 2014)
[16]Getty Images v Stability AI (High Court, London, 2025) (pleadings).
[17] Asian News International v OpenAI (Delhi High Court, 2025), WP(C) No. 4456/2025.
[18] Federation of Indian Publishers, Written Submission in ANI v OpenAI (Delhi HC, 2025).
[19] Ministry of Electronics and Information Technology, Response to Parliament Question No. 1182 (Feb 2025).
[20] Digital Personal Data Protection Act 2023, s 5.
[21] DPIIT Press Release No. AICR/001/2025 (April 2025).
*Shourya Shekhar is a third-year, fifth-semester student at National Law University, Jodhpur, with a keen interest in the field of Intellectual Property.