This second part of our four-part series on using synthetic data to train AI models explores how the use of synthetic data training sets may mitigate copyright infringement risks under EU law.

On 9 December 2023, trilogue negotiations on the EU’s Artificial Intelligence (“AI”) Act reached a key inflection point, with a provisional political agreement reached between the European Parliament and Council.  As we wait for the consolidated legislative text to be finalised and formally approved, below we set out the key points businesses need to know about the political deal and what comes next.

This is the first part of series on using synthetic data to train AI models. See here for Parts 23, and 4.

The recent rapid advancements of Artificial Intelligence (“AI”) have revolutionized creation and learning patterns. Generative AI (“GenAI”) systems have unveiled unprecedented capabilities, pushing the boundaries of what we thought possible. Yet, beneath the surface of the transformative potential of AI lies a complex legal web of intellectual property (“IP”) risks, particularly concerning the use of “real-world” training data, which may lead to alleged infringement of third-party IP rights if AI training data is not appropriately sourced.

On October 30, 2023, the Biden Administration issued a landmark Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (the “Order”), directing the establishment of new standards for artificial intelligence (“AI”) safety and security and laying the foundation to ensure the protection of Americans’ privacy and civil rights, support for American workers, promotion of responsible innovation, competition and collaboration, while advancing America’s role as a world leader with respect to AI.

On October 19, 2023, the U.S. Copyright Office announced in the Federal Register that it will consider a proposed exemption to the Digital Millennium Copyright Act’s (“DMCA”) anti-circumvention provisions which prohibit the circumvention of any technological measures used to prevent unauthorized access to copyrighted works.  The exemption would allow those researching bias in artificial intelligence (“AI”) to bypass any technological measures that limit the use of copyrighted generative AI models.

On October 30, 2023, the G7 Leaders published a Statement on the Hiroshima Artificial Intelligence (“AI”) Process (the “Statement”).[1] This follows the G7 Summit in May, where the leaders agreed on the need to address the risks arising from rapidly evolving AI technologies. The Statement was accompanied by the Hiroshima Process International Code of Conduct for Organizations Developing Advanced AI Systems (the “Code of Conduct”)[2] and the Hiroshima Process International Guiding Principles for Advanced AI Systems (the “Guiding Principles”).[3]

On September 6, 2023, California Governor Gavin Newsom signed Executive Order N-12-23 (the “Executive Order”) relating to the use of generative artificial intelligence (“GenAI”) by the State, as well as preparation of certain reports assessing the equitable use of GenAI in the public sector.  The Executive Order instructs State agencies to look into the potential risks inherent with the use of GenAI and creates a blueprint for public sector implementation of GenAI tools in the near future. The Executive Order indicates that California is anticipating expanding the role that GenAI tools play in aiding State agencies to achieve their missions, while simultaneously ensuring that these State agencies identify and study any negative effects that the implementation of GenAI tools might have on residents of the State.  The Executive Order covers a number of areas, including:

GitHub, acquired by Microsoft in 2018, is an online repository used by software developers for storing and sharing software projects.  In collaboration with OpenAI, GitHub released an artificial intelligence-based offering in 2021 called Copilot, which is powered by OpenAI’s generative AI model, Codex.  Together, these tools assist software developers by taking natural language prompts describing a desired functionality and suggesting blocks of code to achieve that functionality.  OpenAI states on its website that, Codex was trained on “billions of lines of source code from publicly available sources, including code in public GitHub repositories.” 

As we continue to see the rapid development of digital technologies, such as artificial intelligence (“AI”) tools, legislators around the world are contemplating how best to regulate these technologies.  In the UK, the Government has adopted a “pro-innovation” agenda, with the aim of making the UK “an attractive destination for R&D projects, manufacturing and investment, and ensuring [the UK] can realise the economic and social benefits of new technologies as quickly as possible.”[1]