Enormously large data sets are required to train large language models (LLMs) such as ChatGPT. Without the knowledge or consent of the authors, often these data sets contain copyrighted works. The question then arises whether this practice of training ChatGPT and other LLMs on copyrighted works constitutes a copyright infringement. Generally, copyright owners are the only persons authorized to make copies of the work, distribute it, publicly display, and create derivative works of it. These rights can be conveyed unto others by the author giving their permission.
According to this Bloomberg Law article, it seems likely that during the training process, copyrighted text has been copied into OpenAI’s internal database, which would mean a violation of the U.S. Copyright Act of 1976. This may arguably also be the case for ChatGPT’s output, as it can be considered a ‘derivative work’ based on the copyrighted training data. The article then provides an analysis of whether the so-called “fair use” exception applies so that no copyright infringement took place. The conclusion is that this determination is difficult to make without concrete examples.
This kind of legal uncertainty can potentially stifle innovation out of fears of copyright lawsuits. Japan is one of the first countries to amend existing legal frameworks, e.g., those governing intellectual property rights, to facilitate AI innovation. This article examines how the Copyright Act as amended in 2017 creates an enabling environment for creativity, research, and the responsible advancement of AI technologies in Japan. With ChatGPT raising copyright concerns all over the world, Japan’s approach may be looked to for inspiration of possible legislative changes.
Japan’s Approach to AI Regulation
It seems fair to say that Japan’s approach to digital governance is leaning more strongly towards encouraging innovation compared to the strategies followed in the West. But Japan’s is not an either/or approach, as in either innovation or protection against harms. The philosophy is rather to promote human dignity, diversity and inclusion, and sustainability through AI. See the 2019 Social Principles of Human-Centric AI.
Japan, in contrast to the EU, Canada, and the US, does not necessarily favour putting in place comprehensive regulation that specifically constrains AI. It instead puts faith into the voluntary efforts of companies that develop AI systems and provides non-binding guidelines to facilitate such efforts. In addition, Japan was quick in removing hurdles posed by existing legislation to provide much needed certainty regarding the legal landscape that must be navigated by developers of LLMs wishing to use copyrighted works.
Expanding Access for Research and Development
The amended Japanese Copyright Act provisions provide flexibility for researchers and developers by allowing the exploitation of otherwise copyrighted works for testing and developing technology, data analysis, and computer data processing. This expanded access promotes research and development in AI, enabling scientists and engineers to experiment with copyrighted content and drive innovation. It encourages the development of cutting-edge technologies, as researchers can leverage existing works to create new knowledge and solutions without having to be concerned about ambiguous legal restrictions.
Article 30-4 states that it is allowed to use a work in any way necessary in certain cases, listed below, or when the purpose is not to personally enjoy or make others enjoy the thoughts or sentiments expressed in that work. The specific cases where exploitation is allowed include:
- Using the work for testing or developing technology related to recording sounds or visuals of a work or other forms of exploitation.
- Using the work for data analysis, which involves extracting, comparing, classifying, or statistically analyzing the language, sounds, images, or other data from a large number of works or a large volume of data.
- Exploiting the work through computer data processing or other means that do not involve perceiving the content of the work by the human senses.
These exceptions allow for use of the work in these specific circumstances, as long as it does not unreasonably harm the copyright owner's interests.
This means that researchers and developers working on LLMs can leverage existing works to test and refine their models. They can analyze and process large volumes of language data from copyrighted works to improve the performance and capabilities of LLMs. This flexibility promotes the advancement of technologies such as ChatGPT and enables the creation of more accurate and effective language models.
Encouraging Smooth and Efficient Utilization
Article 47-4(1) states that a person is not violating copyrights when storing a copy of a copyrighted work on a computer’s storage medium for purposes of smooth and efficient data processing. Similarly, providing an automatic public transmission server for someone else to use for automatic public transmissions, and recording a work made available for such transmission on a storage medium to prevent delays or failures in the other person's transmissions or to efficiently relay the transmission of that work is not a copyright violation. Lastly, when a person provides data using information or communication technologies and records a work on a storage medium or adapts it to undertake necessary computerized data processing for preparing to provide the data smoothly and efficiently, no copyright infringement takes place.
These technical provisions also do not apply when they would unreasonably harm the copyright owner's interests based on the nature, purpose, and circumstances of the work's exploitation.
The provisions emphasize the importance of smooth and efficient utilization of works in the context of computer-based operations. By permitting the recording of works on computer storage media for seamless data processing, the amended act encourages efficient utilization of copyrighted materials, ensuring uninterrupted and efficient transmission of digital content. This provision is particularly relevant for AI systems that rely on data streams, such as LLMs, chatbots, and recommendation systems.
Balancing Interests and Mitigating Prejudice
While promoting innovation and AI development, the amended provisions also consider the interests of copyright owners by rendering the exceptions discussed invalid in circumstances where the contemplated use of the copyrighted work would unreasonably harm the copyright holder’s interest. They strike a balance by ensuring that exploitation is only permitted in narrow circumstances that do not lead to the distribution and enjoyment of the copyright owners’ work by an audience that has not obtained permission to do so. The financial incentive to produce original work is thus not diminished. This safeguard is essential for maintaining the integrity of copyright law and protecting creators' rights.
By addressing potential concerns, particularly with regard to copying works in the process of data analysis and refinement of computer models, and maintaining a fair and balanced approach, the provisions encourage responsible AI development that respects the rights of content creators. Some amount of uncertainty remains, however, as the Act requires a case-by-case determination of whether a copyright holder’s interests would be unreasonably harmed.
The amended provisions in the Japanese Copyright Act play a pivotal role in fostering innovation and AI development in Japan. By expanding access to works for research and development, promoting smooth and efficient utilization, supporting automatic public transmission, and balancing the interests of copyright owners, these provisions create an environment that encourages responsible AI development and stimulates technological advancement. Many of the thorny questions currently debated in the context of copyright laws in the U.S. and elsewhere are preempted in Japan, providing greater legal certainty and thus allowing LLM developing companies to focus on optimizing their models.
Whether Japan’s approach is the right avenue for the development of copyright law in the West remains to be seen. Differing approaches to AI regulation prevail, as we have seen, yet cross-border collaborations and discourses on these global issues are commencing; e.g., at the G7 Summit in Hiroshima. With ChatGPT’s unprecedented popularity around the globe, flexible governance may mean an economic advantage, but we must not forget that more than money is at stake when rapidly deploying such disruptive technologies.
ABOUT THE AUTHOR
Kathrin Gardhouse is a German and Ontario-trained lawyer with a deep interest in technology and data privacy. She also provides privacy-related thought leadership to Private AI, a Toronto company that is developing world-class privacy-enhancing technology. And is responsible for Haventree Bank's Privacy Risk Management Program as a Manager, Privacy Risk & Records Information Governance.