This second part of our four-part series on using synthetic data to train AI models explores how the use of synthetic data training sets may mitigate copyright infringement risks under EU law.

On 9 December 2023, trilogue negotiations on the EU’s Artificial Intelligence (“AI”) Act reached a key inflection point, with a provisional political agreement reached between the European Parliament and Council.  As we wait for the consolidated legislative text to be finalised and formally approved, below we set out the key points businesses need to know about the political deal and what comes next.

This is the first part of series on using synthetic data to train AI models. See here for Parts 23, and 4.

The recent rapid advancements of Artificial Intelligence (“AI”) have revolutionized creation and learning patterns. Generative AI (“GenAI”) systems have unveiled unprecedented capabilities, pushing the boundaries of what we thought possible. Yet, beneath the surface of the transformative potential of AI lies a complex legal web of intellectual property (“IP”) risks, particularly concerning the use of “real-world” training data, which may lead to alleged infringement of third-party IP rights if AI training data is not appropriately sourced.

On October 30, 2023, the Biden Administration issued a landmark Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (the “Order”), directing the establishment of new standards for artificial intelligence (“AI”) safety and security and laying the foundation to ensure the protection of Americans’ privacy and civil rights, support for American workers, promotion of responsible innovation, competition and collaboration, while advancing America’s role as a world leader with respect to AI.

By Angela Dunning and Lindsay Harris.[1]  Note, Cleary Gottlieb represents Midjourney in this matter.

On October 30, 2023, U.S. District Judge William Orrick of the Northern District of California issued an Order[2] largely dismissing without prejudice the claims brought by artists Sarah Andersen, Kelly McKernan and Karla Ortiz in a proposed class action lawsuit against artificial intelligence (“AI”) companies Stability AI, Inc., Stability AI Ltd. (together, “Stability AI”), DeviantArt, Inc. (“DeviantArt”) and Midjourney, Inc. (“Midjourney”).  Andersen is the first of many cases brought by high-profile artists, programmers and authors (including John Grisham, Sarah Silverman and Michael Chabon) seeking to challenge the legality of using copyrighted material for training AI models.

On October 19, 2023, the U.S. Copyright Office announced in the Federal Register that it will consider a proposed exemption to the Digital Millennium Copyright Act’s (“DMCA”) anti-circumvention provisions which prohibit the circumvention of any technological measures used to prevent unauthorized access to copyrighted works.  The exemption would allow those researching bias in artificial intelligence (“AI”) to bypass any technological measures that limit the use of copyrighted generative AI models.

On October 30, 2023, the G7 Leaders published a Statement on the Hiroshima Artificial Intelligence (“AI”) Process (the “Statement”).[1] This follows the G7 Summit in May, where the leaders agreed on the need to address the risks arising from rapidly evolving AI technologies. The Statement was accompanied by the Hiroshima Process International Code of Conduct for Organizations Developing Advanced AI Systems (the “Code of Conduct”)[2] and the Hiroshima Process International Guiding Principles for Advanced AI Systems (the “Guiding Principles”).[3]

On September 6, 2023, California Governor Gavin Newsom signed Executive Order N-12-23 (the “Executive Order”) relating to the use of generative artificial intelligence (“GenAI”) by the State, as well as preparation of certain reports assessing the equitable use of GenAI in the public sector.  The Executive Order instructs State agencies to look into the potential risks inherent with the use of GenAI and creates a blueprint for public sector implementation of GenAI tools in the near future. The Executive Order indicates that California is anticipating expanding the role that GenAI tools play in aiding State agencies to achieve their missions, while simultaneously ensuring that these State agencies identify and study any negative effects that the implementation of GenAI tools might have on residents of the State.  The Executive Order covers a number of areas, including:

The U.S. District Court for the District of Columbia recently affirmed a decision by the U.S. Copyright Office (“USCO”) in which the USCO denied an application to register a work authored entirely by an artificial intelligence program.  The case, Thaler v. Perlmutter, challenging U.S. copyright law’s human authorship requirement, is the first of its kind in the United States, but will definitely not be the last, as questions regarding the originality and protectability of generative AI (“GenAI”) created works continue to arise.  The court in Thaler focused on the fact that the work at issue had no human authorship, setting a clear rule for one end of the spectrum.  As the court recognized, the more difficult questions that will need to be addressed include how much human input is required to qualify the user as the creator of a work such that it is eligible for copyright protection.