In 2024, IBM conducted a study, where it found that 59% of enterprise-scale organizations actively use AI in their business. While this is a promising statistic, two of the top 5 barriers hindering the successful adoption of AI are ethical concerns and too much data complexity. Perhaps two sub-set of the issues is (a) the effective adoption of a mechanism for machine learning, and (b) recognition of a work of skill relating to or developed using AI and apportioning to such work of skill, protection in some form or manner.
The first piece of the puzzle is identifying the exact mechanism adopted for machine learning. The AI algorithm is required to be trained on data which can be real or synthetic i.e. data generated from actual human activity or data which is generated by mimicking the human activity. So, where does one find this elusive data? One common method to obtaining large amount of data is web-scrapping.
Solely from an intellectual property perspective, web scrapping can result in claims under the copyright legislation, since data is considered to be literary work and therefore amongst others, the exclusive right to communicate it to the public, the exclusive right to reproduce it in any material form, including the storing of it in any medium by electronic means etc. is within the exclusive domain of the owner of copyright.
Therefore, data which may be freely accessible on the internet is scrapped to train the AI model, however freely accessible is not equal to free to copy. Some of the challenges with this model include potential violation of third-party copyright, and violation of the terms of use of various websites where specific restrictions have been placed on parties accessing the website from scrapping data from the website. The existing legal framework in India does not provide for an exception to infringement to apply in cases of web scrapping. This poses as a legal challenge to the entity developing the AI model, as it limits their ability to legally scrape data.
The advent of AI poses some unique challenges, one of which is – how does one treat the use of copyrighted data for training AI models? Would the copyright owner have a right to monetize the data, or would training AI models on copyrighted material be an exception to infringement since they have a transformative effect i.e. address a purpose which is entirely different from the purpose addressed by the original copyrighted material. There are legitimate arguments to be made on both sides, however, this would require some legislative intervention so the rights and interests are adequately balanced. The use of AI models and generative AI also raises concerns in relation to the benchmark to be adopted in affording statutory protection for the work product developed. Since the existing intellectual property framework is all structured in a manner where authorship is associated exclusively with a natural person, the ownership related issues are a gap which are yet to be addressed. There have been some judicial observations on this issue.
For example, the Beijing Internet Court in LI v. LIU, conferred copyright protection to an image created by AI. However, the Copyright Office in India, refused to extend recognition to the AI system RAGHAV as a co-author of an artistic work. The Czech courts have observed that AI-generated works are ineligible for protection under the relevant copyright legislation.
In contrast, the United States Copyright Office in attempting a balance has conferred limited copyright protection to the graphic novel “Zarya of the Dawn,” by granting protection to the textual content and the arrangement of images but refusing protection for the individual images contained within the graphic novel.
Therefore, till such time legislative intervention addresses the gaps, the judicial precedents can at best act as a guidance on what is permissible and what is not.