21/11/2024 | Press release | Distributed by Public on 21/11/2024 18:42
Photo: Omar Marques/SOPA Images/LightRocket via Getty Images
Commentary by Laura Caroli
Published November 21, 2024
On November 14, 2024, the first draft of the European Union's General-Purpose AI Code of Practice, established by the EU AI Act, was published by the EU AI Office. According to Article 56 of the AI Act, the code will outline the rules operationalizing the requirements the Regulation sets out for general-purpose AI (GPAI) models (Article 53) and GPAI models with systemic risk (Article 55). The AI Act is a product safety type of legislation and heavily relies on harmonized standards to support compliance with the requirements. Harmonized standards are sets of operational rules established by European standardization bodies, namely the European Committee for Standardization (CEN), the European Committee for Electrotechnical Standardization (CENELEC), and the European Telecommunications Standards Institute, where industry experts as well as, in smaller proportion, civil society and trade unions translate requirements set out by EU sectoral legislation upon a specific mandate by the European Commission.
The commission assesses the proposed standards, and if they respond to the mandate, they will be adopted, becoming so-called harmonized standards. Harmonized standards provide a presumption of conformity to any company using them to comply with the respective legislation. In the field of high-risk AI systems, which constitute the bulk of requirements set out by the AI Act, such standards do not yet exist. CEN and CENELEC are currently working to have them ready by April 2025, in time for the requirements for high-risk AI systems to apply in two years.
Regarding GPAI models, however, the field of AI safety is nascent. Furthermore, only a handful of models meet what the AI Act calls "GPAI models with systemic risk," namely models that reach the 1025 FLOPs threshold set out by the AI Act. FLOPs stands for "floating point operations" and is a widely accepted metric for measuring computing power, indicating the number of floating-point operations required to perform a particular computation. Few model providers have the necessary expertise to manage models, and responsible practices such as a risk management system are not yet common practice. This, coupled with the necessity to have these requirements apply within 12 months since the act's entry into force, makes it impossible to have recourse to the common standardization procedure on time, considering such a procedure takes much longer. This is why co-legislators decided to adopt a different approach to the operationalization of requirements for GPAI models. The approach is set out in Article 56 and consists of a co-regulatory effort. This makes it similar to standardization, but under less codified conditions as there is no specific mandate, and it is not completed by established standardization bodies. This process takes the form of a multistakeholder platform where industry (notably model providers, but also downstream providers), independent experts, and civil society discuss together to establish practices, measures, and benchmarks.
The code will likely be the most comprehensive public document detailing responsible practices for powerful AI models in the world. Therefore, every iteration of its drafting is highly anticipated and observed internationally. Direct participation in this process is considered strategic by all major tech companies, which are all represented in the drafting platform, in order to directly shape the rules. Because a wide number and variety of stakeholders are actively involved, the code will likely form a solid basis for responsible governance of frontier AI models and AI safety far beyond the European Union. Article 56 states that the AI Office and the AI Board (composed of the member states' authorities responsible for the enforcement at the national level) will regularly assess the adequacy of the code and, once the AI Office deems it to be stable enough, it "may, by way of an implementing act, approve a code of practice and give it a general validity within the Union." This entails that, similarly to a harmonized standard, the Code of Practice will provide similar effects to a presumption of conformity for companies, although it is not legally possible to phrase it like that, as that can only be provided by harmonized standards.
It is important to note that the drafting of the code does not operate in a vacuum but in the context of intense international cooperation in the field. Indeed, AI Safety Institutes (AISI) are also working to establish common practices of responsible governance of powerful AI models; the AI Act itself is inspired by the Code of Conduct of the G7 Hiroshima Process; international standardization bodies, the OECD, as well as academia, are also working to advance research in this regard. It is explicitly mentioned in Article 56.1 of the AI Act that the drafting of the Code of Practice will occur "taking into account international approaches," and this principle is recalled several times in the draft. It is therefore evident that the different AI governance efforts in place will inform one another, although the Code of Practice will likely hold greater significance as the drafting platform is the most inclusive on the table in terms of participation, and the code itself will hold stronger legal value once finalized, while the other practices will remain largely voluntary.
Another important disclaimer to add is that the Code of Practice will likely not govern the operational practices of GPAI models for the years to come, but only temporarily until standards emerge and are finalized in this nascent field. This doesn't diminish its significance, however.
Drafting of the Code of Practice is taking place under strong pressure to have it ready by no later than May 2, 2025, and in effect by August 2025. These time constraints, as well as the sensitivity of the issue, have challenged the AI Office's approach to the drafting of the code. Initially oriented towards a process behind closed doors, after substantive public and political pressure, the AI Office decided to opt for a more inclusive drafting exercise, going to the point of appointing several independent experts as chairs and vice chairs of the working groups in which the drafting will be structured, including internationally renowned experts working in the United States, Canada, and the United Kingdom. In parallel, the AI Office launched a public call for interest to participate at the end of July 2024, together with a call for input by September 18, which received over 400 contributions, forming the basis of the first draft of the code itself.
A first plenary took place on September 30, attended by around 1,000 stakeholders, with individual experts and academia representing more than two-thirds of the participants, model providers representing 4 percent, and industry 21 percent of the total. International actors and bodies such as the other AI Safety Institutes, the OECD, the United Nations Educational, Scientific and Cultural Organization as well as the International Telecommunication Union, were invited to the platform as observers. During the selection process, prospective participants were asked to select one or more of the four proposed working groups:
Working groups two, three, and four will all write the rules applying to the top tier of the models, that is, to GPAI models with systemic risk (Article 55).
Interestingly, a fifth subgroup has been established, and it will serve as the providers' workshop group, where model providers will have the opportunity to directly engage with the AI Office in assessing the concrete feasibility of the proposed measures. Looking at the composition as it stands, and at how small the percentage of actual model providers is compared to other stakeholders, one would think that the true, unfiltered interactions between the AI Office and the providers will occur in this fifth subgroup, rendering the rest potentially less relevant.
There will be four different iterations of the drafting over the next five months, including time for discussions within the groups and a deadline to submit comments to the draft.
The document published on November 14, 2024, is the first draft of the first iteration. The deadline for comments is November 28, while the week of November 18-22 is dedicated to the discussions. This clear timeline should help organizations manage their resources in order to meaningfully participate.
The code details the requirements of Article 53 for all GPAI model providers and those of Article 55 for GPAI models posing a systemic risk. Article 53 is mainly about transparency, detailing information about the model to provide to both the AI Office and downstream providers, as well as elements enabling enforcement of existing copyright rules. Article 55 requires providers of models with systemic risk to perform a model evaluation to identify and mitigate systemic risk; keep track and report serious incidents; and ensure an adequate level of cybersecurity protection of the model.
The draft code only starts detailing measures and sub-measures for each of the requirements, while key performance indicators will be included in further iterations of the drafting process. Another notable element is the presence of various open questions to the participants, to direct future discussions. Regarding the requirements for all model providers (Article 53), the section on transparency rarely provides much more detail than the elements already set out by the letter of the law, notably in Annex XI and XII. The section on copyright contains more sub-measures, covering both downstream and upstream compliance, and for example, prohibiting the crawling of piracy websites. Interestingly, this section is also the only one that does not contain any open questions for the participants to reflect on.
As could be expected, however, the sections containing the most valuable additions to the provisions of the AI Act are the ones on GPAI models with systemic risk. First, the draft opens with a taxonomy of systemic risk, which is absolutely essential to define: the field of frontier AI requires a redefinition of this taxonomy compared to the usual sectoral ones, as AI goes far beyond the possibility of just affecting health and safety as other products normally are considered to do. Detailing the rules for future meaningful risk assessment and mitigation practices must, therefore, start with this exercise.
The taxonomy section is brief but provides a valuable first dissection of the various elements composing the risk (types, nature, and sources), although they are already hinted at in recital 110 of the act. Interestingly, one of the open questions in this section is related to the issue of child sexual abuse material and nonconsensual intimate imagery, two extremely sensitive topics that are exacerbated by the diffusion of generative AI and are currently prominent in public debate.
The draft then introduces two broad requirements for providers of GPAI models posing systemic risk: they should establish a Safety and Security Framework (SSF) detailing the risk management policies, as well as create a detailed Safety and Security Report (SSR) compiling relevant information on the risk assessment and mitigation measures.
Regarding risk identification, one notable element is the hint toward possible further tiers of what is considered a systemic risk. This is mentioned in the explanatory box introducing section IV as well as in the conclusions of the draft: "The current draft is written with the assumption that there will only be a small number of both general-purpose AI models with systemic risks and providers thereof. Should that assumption prove wrong, future drafts may need to be changed significantly, for instance, by introducing a more detailed tiered system of measures aiming to focus primarily on those models that provide the largest systemic risks." At the same time, one of the sub-measures of the risk analysis provision introduces tiers of severity, "including at minimum a tier at which the level of risk would be considered intolerable absent appropriate safeguards." Among the open questions is the definition of such tiers. Such a definition is fundamental, as is defining the mentioned risk taxonomy. Indeed, unlike other sectors, such as aviation, energy, and telecoms, where what constitutes intolerable levels of risk is known, measurable, and widely accepted, the field of frontier AI is being studied as it evolves and it is still too recent, so setting up these tiers will likely constitute a key benchmark, that will influence future standardization work.
The suggested type of model evaluation informing the SSF provides more details about the degree of rigor expected, level of detail, and proposed methods. It should encompass the whole lifecycle of the model, from pretraining to post-deployment. Interestingly, the draft acknowledges the need to reflect on specific methods for providers of open-weight models to be able to monitor them after release, given the specificities of this modality of distribution, and includes an open question on this: an issue that was known to the co-legislators already during the negotiations of the AI Act and that still needs a clear indication.
As for risk mitigation, the draft distinguishes between technical and governance measures. While the first includes both safety and security mitigations linked to each systemic risk indicator, including the possibility of deciding to not proceed with the model deployment under specific conditions, the latter requires ownership and responsibility at the company's highest level, including the establishment of a risk committee, setting up whistleblower protections and a direct communication link to the AI Office. The governance chapter contains further elements about the involvement of independent experts in the model evaluation (both pre- and post-deployment), which can be the AI Office itself or other evaluators, with open questions on what constitutes an appropriate third-party evaluator. Another common issue in this nascent field is indeed the scarcity of such independent experts. With the current accumulation of expertise around the various AI Safety Institutes, it could be possible to envisage a future role of these in the process, provided a sufficient level of mutual recognition is agreed by the AISI Network, as has been suggested.
Key principles that underpin this first effort to detail the operational requirements for the GPAI model, apart from international collaboration, are proportionality, both to the level of risk and to the size of the company, and a strong encouragement to transparency, even suggesting the model providers should publish both their SSF and SSR, in order to contribute to the further development of standards and research in this still immature sector.
Overall, the draft is far from providing a detailed framework, and it seems to leave space to incorporate the discussions and contributions of the different stakeholders. It tries to balance a strong level of responsibility by model providers in managing systemic risks with an emphasis on proportionality. This is particularly clear in the preamble to Section IV, stating that "there is less need for more comprehensive measures where there is good reason to believe that a new general-purpose AI model will exhibit the same high-impact capabilities as exhibited by general-purpose AI models with systemic risk that have already been safely deployed."
The text of the EU AI Act imposes GPAI models with very high-level requirements. The still immature nature of the field of AI safety led EU policymakers to opt for a co-regulatory effort in further detailing the rules, through the drafting of a Code of Practice. The platform launched by the European AI Office for stakeholders to contribute to the drafting has seen a high rate of interest and participation. Overall, the first draft looks balanced between the need to manage and mitigate risks and the need to enable innovation: the suggested measures are commensurate to the nature of the systemic risks it intends to address and mitigate, while the accent on proportionality and the attention towards possible simplified means of compliance for SMEs make it more innovation friendly. Given the (still incomplete) concrete measures and recommendations included in this first draft, it is imaginable that civil society organizations and independent experts will strive for stricter measures and less proportionality, while strongly supporting the call for public transparency, whereas providers will claim that many of the proposed measures are too burdensome and unfeasible, and that transparency would undermine trade secrets. It is also probable, however, that the most relevant interactions will occur at the provider-workshop level and that the providers' feedback will be taken very seriously, as they will be the ones required to implement the rules.
While it will be fundamental to follow its further iterations and understand the points of view of the different stakeholders included in the process, the draft undoubtedly constitutes a precious document. Overall, the collaborative nature of the multi-stakeholder drafting platform, which includes the world's most prominent experts in the field, will likely return Europe to the heart of debates around AI safety, influencing emerging accepted responsible practices on frontier AI.
Laura Caroli is the senior fellow of the Wadhwani AI Center at the Center for Strategic and International Studies in Washington, D.C.
Commentary is produced by the Center for Strategic and International Studies (CSIS), a private, tax-exempt institution focusing on international public policy issues. Its research is nonpartisan and nonproprietary. CSIS does not take specific policy positions. Accordingly, all views, positions, and conclusions expressed in this publication should be understood to be solely those of the author(s).
© 2024 by the Center for Strategic and International Studies. All rights reserved.