[ad_1]
ABSTRACT:
With the growing emergence of Artificial Intelligence (AI), especially Generative AI in almost every sector, it has become necessary to reevaluate various legislations in order to adapt. This development has brought with it various opportunities for development along with certain challenges particularly with respect to the protection of Intellectual Property Rights more specifically the Copyright Law. The challenges posed by these AI models to intellectual property laws are being confronted across the globe at an increased rate in recent times, with the growing understanding of such programs. By comparing the Indian legal system with international viewpoints, this paper investigates the relationship between AI training techniques and copyright laws. While focussing on the current litigation trends, and key legal principles, this paper aims to place suggestions that balance the interests of AI developers and copyright holders all while fostering creativity.
KEYWORDS: AI, Generative Artificial Intelligence, Copyright, Development, Intellectual Property Rights
RESEARCH METHODOLOGY:
This paper uses the doctrinal methodology of research, as the paper primarily refers to the secondary sources of data such as case laws, existing legal principles, research papers, articles, and statutes.
REVIEW OF LITERATURE:
This study looks at how generative AI and Indian copyright law interact, offering a fresh framework for evaluating possible defences and claims. We contend that the distinctive architecture of Indian copyright law may provide an alternative viewpoint to the global discussion surrounding fair use and transformative purpose defences. Non-expressive use of copyrighted works for machine learning, in which no human accesses the expressive content, may fall outside of copyright’s traditional subject matter entirely. The paper provides both doctrinal analysis and practical consequences for AI development within India’s legal system by thoroughly addressing important issues like training-stage liability, web scraping defences, result phase infringement, and attribution of accountability for verbatim outputs.
Taking a step back, this research uses case studies to empirically explore three technology scenarios. In order to produce descriptions appropriate for legal analysis, we encapsulate the accepted industrial approach of an AI lifecycle (data collection, data organisation, model training, and model operation). This will make it possible to evaluate the difficulties in harmonising rights, exemptions, and disclosures under EU copyright law. At the EU level, we evaluate policy interventions aimed at elucidating the legal standing of input data through copyright exceptions, opt-outs, or mandatory disclosure of copyright elements. We conclude that the most likely outcome is a machine learning environment that is fully copyright licensed, which could have negative implications for scientific research, industrial structure, and innovation.
INTRODUCTION:
Generative AI is essentially a machine-learning model that learns to produce new data, in response to user’s prompt. A generative AI system is one that gets trained to produce more objects that resemble the data it has seen. GenAI uses neural networks that are similar to the structure and function of neurons in the human brain. GenAI can be educated on human language, coding languages, art, chemistry, biology, or any complex subject matter.
“We were generating things way before the last decade, but the major distinction here is in terms of the complexity of objects we can generate and the scale at which we can train these models” states Andrey Markov, a Russian mathematician who introduced the Markov Model one of the earliest forms of Gen AI.
With the advent of ChatGPT by OpenAI in November 2022 significant discussions have sparked with respect to the application of artificial intelligence (AI) across multiple sectors, including commerce, education, and broader societal contexts. While AI has been in use in various capacities for many years, the introduction of Generative AI (GAI) tools like ChatGPT, Jasper, DALL-E and Gemini, with their user-friendlier design, ease of use and hysterically high performance have enabled access to a wider user base making use of GAI’s ability to generate all kinds of content from text, images, audio, code, even video. This has necessitated a deeper exploration on its impact on various sectors particularly on Intellectual Property Rights.
BACKGROUND:
AI development is posing unique questions about creativity, not just in India but globally, and there is a pressing need for innovation, whether in India or globally, and the copyright law needs to balance these needs. There are concerns of infringement at both the input and output phases when using copyrighted data in training datasets. Copyright laws typically require that a work be authored by a human in order to accrue protection. But AI-generated content complicates these lines further by producing derivative works based on large datasets. There are no clear guidelines on the application of current measures in many areas, especially when it comes to appropriateness for use cases unique to AI, which has become the need of the hour.
AI TRAINING – HOW IT WORKS:
Large Language Models (LLM) function similarly to parasites, consuming copyrighted content produced by people using Text and Data Mining (TDM). It involves providing enormous volumes of data to LLMs for training, in order to simulate the human mind. The LLM must produce digital copies of the data for this training process. However, the majority of AI systems only temporarily keep the actual training data in memory, while storing uncopyrightable mathematical units that were taken from the training data. Additionally, by inferring the fundamental trends, connections, and frameworks from the data, they produce entirely unique sentences, images, and other content.
The success of the repetitive method of training AI models depends on the calibre and comprehensiveness of the data being provided as well as the trainers’ capacity to recognise and correct any flaws. The objective of AI model training is to develop a computational model that balances the numerous potential factors and produces an output with accuracy. This process is often compared to that of parenting and more particularly with that of a child learning a new skill from absorbing the data in its surroundings.
GLOBAL PERSPECTIVES: BALANCING INNOVATION & PROTECTION
Countries around the world are taking different approaches to regulating the use of copyright content in AI training. Within the U.S., there are certain allowances to this restriction based on a legal doctrine known as ‘Fair Use,’ which permits the use of copyrighted material under certain conditions in defined amounts without the explicit permission of the copyright holders. On the other hand, the European Union (EU) is currently formulating stricter rules, via measures such as the proposed AI Act, that seek to provide an all-encompassing framework for regulating the use of AI, this includes provisions concerning intellectual property.
There have been several high-profile international conflicts involving AI and copyright law between media behemoths in the entertainment and journalism sectors and AI businesses.
USA:
In a comic turn of events Michael Cohen, the former attorney for Donald Trump, pleaded innocence after being found guilty of presenting citations of cases that never happened, or what the court ultimately called “hallucinations.” This was true as he admitted to being the one who used the open-search Google Bard (now Gemini), a multimodal large language model (LLM) of artificial intelligence (AI), to find the citations. The judge reprimanded Cohen’s attorney to double-check case citations supplied by artificial intelligence.
With the help of precedents, the U.S. law currently attempts to solve the problem of copyright protection for data during the GenAI training phase. In general, U.S. regulations are focused on industry. When it comes to technologies that bear some resemblance to GenAI, such as search engines, plagiarism detection software, and book digitisation, the principle of “fair use” is used. There are presently no official, mandatory regulations in the United States regarding the disclosure of training data. However, the GenAI Copyright Disclosure Act, a new measure presented by Congressman Adam Schiff on April 9, 2024, mandates that AI businesses declare the usage of copyrighted training data content.
It was propounded that “OpenAI’s LLMs endanger fiction writers’ ability to make a living, because the LLMs allow anyone to generate effortlessly and freely (or very inexpensively), texts that they would otherwise pay writers to create,” the plaintiffs in the recent US case of Authors Guild v. OpenAI wrote in their statement. Additionally, OpenAI’s LLMs are capable of producing derivative works that negatively impacts the market for plaintiffs’ works by imitating, summarising, paraphrasing, or relying on them.
In the US, certain publications such as the ‘Associated Press’ have entered into agreements with OpenAI since 2023, thereby allowing OpenAI to use their content to train the AI model for better responses in return for a sum of money.
EU:
The Regulation of The European Parliament and of The Council laying down harmonised rules on Artificial Intelligence (the AI Act) governs the rights and liabilities arising out of AI models’ usage. EU law tends to be more rights-oriented than U.S. law. With very few exceptions or restrictions, the EU generally forbids AI model providers from using copyrighted information as training data indiscriminately and for free. The Act mandates that, using templates supplied by the AI Office, AI model providers create and make available to the public adequately comprehensive summaries of the data that is used to train AI models. Issues like misalignment with the enormous data demands may arise from the rights-oriented approach of EU law, which permits copyright holders to make explicit arrangements to prevent their works from being utilised for GenAI data training.
When the DSM Copyright Directive was implemented in France in 2019, the Autorité (French Competition Commission) started looking into Google’s interactions with news organisations and publishers. The DSM Copyright Directive acknowledged that press publishers should have the authority to permit online use of their publications, including replication and public dissemination. The Autorité clarified that it had not determined whether the use of press publications in an AI service constituted a violation of press publishers’ rights, amid the ongoing controversy about whether such usage qualifies for protection from a related-right perspective.
By employing spiders, businesses search web pages for data and information, which the spiders then use for publication on their own website. The companies doing the scraping contend that they should be allowed to use the information that is already in the public domain, and the website owners who are the targets of the scraping are opposed to their data being utilised without their authorisation. The Court of Justice of the European Union in the 2015 case of Ryanair Ltd. v. P.R. Aviation BV ruled that, “the use of automated systems or software to extract data from this website for commercial purposes, (‘screen scraping’), is prohibited unless the third party has directly concluded a written licence agreement”
CHINA:
The general norm according to the Chinese Copyright Law is that copyright infringement only happens when a permanent copy is made. The Beijing Internet Court is currently hearing the first case in China to consider whether AI training violates copyright. The case hearing was held in June 2024. Four Chinese illustrators filed the lawsuit against Trik AI’s creators.
One of the main arguments was that, in accordance with Article 24 of the Chinese Copyright Law, using the plaintiffs’ works for AI training should be regarded as “fair use“. According to the judgment, the Court held a Chinese AI company had infringed on another’s copyrights in the process of providing generative artificial intelligence (“AI”) services and should bear relevant civil liabilities.
Regulators in China have not yet directly addressed these copyright concerns; rulings from the nation’s high-tech internet courts have fuelled a heated legal discussion on related topics without settling on the training data’s legal standing. However related issues on AI and Copyright have come to the forefront. The Beijing Internet Court decided in November 2023 that AI-generated photos can be protected by Chinese copyright law, provided that the degree of human involvement satisfies the requirements for “originality” and “intellectual achievement.”
INDIA:
The regulatory regime in India is evolving with respect to AI and its impact on copyright. The National Programme on Artificial Intelligence (NPAI) is an initiative by India that emphasizes India’s determination to utilize AI for altruistic improvement for its people.
The Indian stance will be decided by the recent cases that were filed before the Delhi High Court, namely ANI Media Pvt. Ltd. v. Open AI Inc. & Anr., 2024, where the prominent Indian news agency has sued OpenAI’s ChatGPT, claiming against the use of its copyrighted news information for training and profit without authorisation; and Kanchan Nagar & Ors. v. Union of India & Ors., 2024, a PIL praying for the court to step in and control the AI systems that make use of copyrighted content without consent, specifically with respect to photos and images by professional models and artists.
Among other rights, copyright law grants the owner the sole right of reproduction and safeguards the author’s original works. Use of someone else’s copyrighted work without the required permits or licenses is considered copyright infringement under Section 51of the Indian Copyright Act, 1957.
POTENTIAL IMPLICATIONS ON INDIA:
On examining the above approaches and adaptations across the globe by world leaders in the field of AI, particularly with respect to AI training and protection of Copyrights, the following probable implications can be anticipated:
Enhanced Legal examination: Startups may be subject to more intense legal examination for their data usage policies. As businesses negotiate the regulatory environment, this can result in a more guarded approach to AI development, which could impede innovation.
Barriers to Market Entry: Well-known AI firms, such as OpenAI, have a competitive advantage because they have already trained their AI algorithms on big datasets. This volume of training data will be difficult for new competitors to replicate, which could impede innovation and competition.
Cost of Compliance: For AI businesses, the requirement to obtain licensing agreements or put opt-out procedures in place may raise operating expenses. Smaller companies may be discouraged from approaching the market or compelled to look for less heavy on data AI development methods as a result.
Licensing Contracts: By offering content producers a new source of income and AI firms legally compliant data sources, the case may promote the growth of a licensing ecosystem for AI training data.
Accountability and Transparency: Laws may require AI corporations to reveal the sources of their data and offer opt-out procedures to content creators in order to ensure transparency in the use of AI training data.
CONCLUSION:
The Berne Convention for the Protection of Literary and Artistic Works (the Berne Convention) provides the member countries a great deal of legislative freedom in defining particular restrictions and exceptions to copyright protection. But there are also significant drawbacks to this flexibility, most notably the absence of international uniformity and consensus regarding the laws governing copyright restrictions and exceptions. As a result, there are significant variations in the ways that these problems are handled, which results in different legal frameworks for copyright protection and GenAI model training across the globe.
With the wide reach of AI and increasing access to data across borders, it is the need of the hour to formulate uniform legislations around the aspects of such AI trainings and its potential implications on copyright law so as to effectively protect the creators’ rights and allow for the development of technology.
The growth of AI in recent trends has often been compared with that of the advent of personal computers by many scholars, which goes to show the crucial and undeniable part that such AI models will play in the day-to-day lives of the public, thereby necessitating proper legislation to ensure their optimal use without encroaching on anybody’s existing rights.
St. Joseph’s College of Law
[ad_2]
Source link

