From Risk Framework to Public Claim

How AI Safety Documents Organise Responsibility

Saman Samadi, PhD (Cantab)

Article / PDF | 10 June 2026

Focus: Responsible AI governance; risk frameworks; preparedness policies; public accountability.

Download PDF | Return to AI Safety Portfolio

Abstract

Frontier-AI governance frameworks now give risk a recognisable documentary grammar. A concern about capability moves through thresholds, evaluations, safeguard design, internal review, and selective disclosure before it appears in a system card, model card, risk report, safety report, or governance disclosure. This article examines that movement from risk framework to public claim. It compares Anthropic’s Responsible Scaling Policy, OpenAI’s Preparedness Framework and Frontier Governance Framework, NIST’s AI Risk Management Framework and Generative AI Profile, Google DeepMind’s Frontier Safety Framework, the Seoul Frontier AI Safety Commitments, and safety-case work associated with the UK AI Security Institute. The argument is that these frameworks operate as documentary infrastructures of responsibility. Their work lies in arranging the conditions under which judgement becomes reviewable. Evidence is assigned to specific actors; residual risk is placed before authorised decision makers; safeguards become conditions of deployment; disclosure rules determine how uncertainty enters public language. The public form of AI safety is therefore one of the institutional sites where evidence, capability, discretion, and accountability are made durable.

Keywords: AI safety documentation; frontier AI governance; responsible scaling; preparedness frameworks; system cards; model cards; residual risk; safety cases; public accountability.

1. Introduction: Governance after the System Card

The recent proliferation of frontier-AI governance frameworks has changed the documentary setting in which public safety claims are made. A system card now exceeds the older role of a release document that summarises model capabilities and limitations, taking its place within a larger administrative order in which risk frameworks, preparedness policies, capability thresholds, safeguard reports, safety cases, external reviews, and disclosure commitments all exert pressure on the language that reaches the public. The system card remains a visible artefact, but the claim it makes has begun to derive force from a wider chain of prior documents.

That chain has acquired a public template. The Frontier AI Safety Commitments announced at the AI Seoul Summit in May 2024 asked signatories to assess severe risks across the AI lifecycle, define thresholds for risks judged intolerable, specify what would happen if thresholds were approached or crossed, maintain accountability and governance structures, and publish frameworks or reports explaining implementation. The commitments did not settle what each threshold would mean in practice, yet they gave frontier-lab governance a recognisable shape: evaluation would feed into risk assessment; risk assessment would trigger safeguards or non-deployment; accountability would require identified organisational responsibility; and public transparency would turn internal safety work into documents that outside readers could examine.[1]

The most consequential development lies in that last movement. Frontier laboratories now publish documents that present safety as a relation among evidence, threshold, mitigation, decision, and disclosure. Anthropic’s current Responsible Scaling Policy is organised around Frontier Safety Roadmaps, recurring Risk Reports, executive approval, external review, and a public update page that records policy changes across versions.[2] OpenAI’s Preparedness Framework places severe-harm capability categories inside a reporting and safeguard-review process, while its Frontier Governance Framework translates preparedness work into a more legal and systemic-risk vocabulary.[3] Google DeepMind’s Frontier Safety Framework describes a risk-management process that begins with early-warning evaluations and moves through inherent-risk assessment, mitigation, residual-risk assessment, risk acceptance, and safety cases at Critical Capability Levels.[4] NIST’s AI RMF and Generative AI Profile, operating at a more general level, give this emerging field a broader administrative language of governance, mapping, measurement, management, content provenance, pre-deployment testing, and incident disclosure.[5]

The documents differ in architecture. Anthropic’s framework is especially public-facing in its treatment of Risk Reports and external-review commentary. OpenAI’s framework is more threshold-and-safeguard oriented, with the Safety Advisory Group occupying a prominent advisory role before leadership accepts residual risk. Google DeepMind’s framework is more formalised as risk-management procedure, with residual-risk acceptance and safety-case supplementation becoming the hinge between evaluation and deployment. NIST supplies a broader risk-management grammar, and its importance lies in the way it standardises the vocabulary through which organisations can document, monitor, communicate, and revise AI risk beyond any single company-specific release gate. The differences matter because each framework gives a different documentary location to judgement. Some judgement is placed in thresholds, some in risk reports, some in governance committees, some in executive acceptance, some in public disclosure rules.

The central question follows from this dispersion. How does a safety concern become an institutional procedure? How does an evaluation result become a deployment decision? How does a deployment decision become a public claim? These questions move the analysis away from a narrow assessment of whether a particular framework is strict or lenient, and toward the more consequential issue of how responsibility is organised through documents. A risk framework exceeds background policy because it determines the evidence to be gathered, the uncertainty to be carried, the actors authorised to interpret results, the safeguards treated as relevant, and the parts of the reasoning later exposed to public judgement.

The argument developed here therefore treats frontier-AI governance frameworks as documentary infrastructures. Their authority rests partly on technical evaluation, but evaluation alone cannot decide whether a model may be trained further, deployed externally, restricted to limited access, delayed, monitored, or accompanied by new safeguards. The framework gives evaluation a route into action. It defines the record in which capability becomes risk, risk becomes review, review becomes acceptance or refusal, and acceptance becomes the language of public safety. This route is rarely complete. The gaps are as important as the rules. Public accountability depends on the way those gaps are made visible.

2. The Framework as a Documentary Form

The idea of a responsible scaling policy predates some of the most recent company frameworks, but it already contained the structure that now organises much frontier-AI safety documentation. METR described responsible scaling policies as commitments that connect increasingly powerful models to increasingly demanding protective measures, with evaluations used to decide whether a model has reached dangerous capability thresholds and with defined responses attached to those thresholds.[6] The significance of this genre is that it places safety work inside a conditional documentary sequence. The laboratory must specify the evidentiary and institutional conditions under which continuing to scale or deploy would remain permissible.

A related pattern appears in the policy literature on frontier AI safety. A 2023 UK government report on emerging frontier-AI safety processes described responsible capability scaling as a developing practice that connects model evaluations, risk thresholds, mitigation measures, and organisational governance across the model lifecycle.[7] The later synthesis of emerging practices in frontier-AI safety frameworks grouped the field around risk identification and assessment, risk mitigation, and governance, a triadic structure that now appears in different forms across

Anthropic, OpenAI, Google DeepMind, and broader policy frameworks.[8] The convergence is not accidental. Once a laboratory treats increasingly capable models as possible sources of severe misuse, loss-of-control, or systemic harm, a purely narrative safety statement becomes insufficient. The claim must acquire procedure.

The framework form therefore has several recurring features. First, it names the risk domain. In OpenAI’s Preparedness Framework, the operational tracked categories are biological and chemical capability, cybersecurity, and AI self-improvement, while a separate set of research categories marks risks that remain important but less mature as deployment triggers.[9] Google DeepMind distinguishes misuse risks, machine-learning research-and-development risks, and misalignment risks, while locating each within Tracked Capability Levels or Critical Capability Levels.[10] Anthropic’s RSP has evolved through several versions, moving from earlier ASL-linked capability thresholds toward a more report-driven and roadmap-based structure, while retaining the connection between capability, safeguards, and catastrophic-risk reasoning.[11] NIST’s AI RMF, by contrast, does not name frontier-risk categories in this way. It asks organisations to map risks in context and then define the tolerance, measurement, management, and governance structures through which those risks will be treated.[12]

Second, the framework defines a threshold or decision point. The Seoul commitments gave this feature public prominence by asking companies to specify thresholds at which risks would be considered intolerable and to define actions that would follow if such thresholds were reached.[13] Thresholds are not simply numbers. In many cases they combine technical evidence, threat modelling, and institutional judgement. OpenAI’s thresholds distinguish High and Critical capabilities in tracked severe-harm domains, with corresponding safeguard requirements and reporting processes.[14] Google DeepMind’s alert thresholds and Critical Capability Levels trigger proximity assessments, mitigation plans, residual-risk analysis, and safety cases when the capability level has been reached.[15] Anthropic’s current public update page records the revision of its novel chemical/biological weapons threshold and makes visible the fact that thresholds remain subject to policy learning and reinterpretation over time.[16]

Third, the framework attaches evidence to action. Evaluations acquire force inside reports, scorecards, safety cases, or residual-risk assessments, where their results can be carried toward review and decision. The resulting evidence becomes usable because it is made to answer a governance question: whether a model remains below a threshold, whether a safeguard sufficiently reduces risk, whether a deployment should be limited, whether a policy update is required, whether an external reviewer or government authority should be informed. A benchmark result has no public authority until a document defines what kind of decision it can support.

Fourth, the framework describes disclosure, even where disclosure remains partial. Anthropic foregrounds public Risk Reports, public external-review commentary, redaction principles, and a transparency hub.[17] OpenAI’s Preparedness Framework includes public disclosure of testing scope, tracked-category evaluations, deployment reasoning, and safeguards for models beyond High capability, while its Frontier Governance Framework introduces Safety and Security Model Reports under a more legal register.[18] Google DeepMind’s FSF includes public-facing model cards and FSF reports but reserves some disclosure for government authorities or other organisations in cases involving unmitigated material public-safety risk.[19] NIST gives disclosure a wider organisational form by treating provenance, incident processes, communication plans, and documentation of release trade-offs as part of responsible governance.[20]

Through these features, the risk framework becomes a genre of institutional memory. It records how future beliefs are supposed to be formed, tested, challenged, revised, authorised, and communicated, giving the organisation’s present commitments duration beyond a single statement of belief. This temporal dimension is crucial. The framework is a promise about later documentation. A system card written after such a framework is no longer a free-standing explanation. It becomes a public surface through which earlier procedures become claimable.

3. Anthropic: Risk Reports and the Public Case for Safety

Anthropic’s Responsible Scaling Policy is the clearest example of a governance framework that turns documentation itself into a central safety mechanism. Its public update page describes the RSP as proportional, iterative, and exportable, while preserving a public record of current and prior versions. The current page states that version 3.3 became effective on 26 May 2026 and records changes to the novel chemical/biological weapons threshold, off-cycle model-risk updates, and terminology. It also retains links to earlier versions, including version 3.1, whose text provides the most detailed account of the report and governance architecture now associated with the policy.[21]

The shift from earlier ASL-centred policy language to the current version matters for the article’s argument. Anthropic’s update page and policy materials indicate that the RSP has moved toward a regime in which affirmative safety arguments, public Risk Reports, and Frontier Safety Roadmaps play an increasingly prominent role. Earlier versions treated AI Safety Levels as a ladder connecting model capability to predefined safeguards. In the more recent language, ASL refers primarily to groups of technical and operational safeguards, while the organisation emphasises analysis and arguments that make a strong case for safety.[22] The documentary form changes accordingly. The framework’s centre of gravity moves from a labelled level toward the written case by which a model or risk domain is judged.

Version 3.1 describes Risk Reports as recurring documents covering all publicly deployed models and certain internal models, produced every three to six months. These reports are required to discuss threat models, capability and alignment evaluations, mitigations, overall and threat-specific risk assessments, and a risk-benefit determination. They also have a defined approval structure: internal subject-matter experts draft the reports, internal and external feedback is solicited, the CEO and Responsible Scaling Officer approve them, and Board or Long-Term Benefit Trust approval is required in cases where marginal-risk reasoning is central.[23] The public claim of safety is therefore prepared through a chain of named responsibilities before it reaches the rhetoric of launch.

Anthropic’s February 2026 Risk Report shows how this structure travels into public form. The report distinguishes itself from system cards by explaining that system cards accompany individual model releases and analyse some dimensions of model risk, while Risk Reports offer a cross-model account of mitigations and overall risk posture.[24] That distinction gives Anthropic’s documentary system a layered structure. A system card can explain the reasoning behind a release; a Risk Report can locate several releases within a longer view of capability, mitigation, and residual risk; a roadmap can declare the direction of future safety work; and the RSP page can record how the governing policy itself changes. The claim of responsibility is distributed across these documents.

The distribution matters because it changes how the reader can test the public claim. When Claude Sonnet 4.6’s system card states that it outlines the reasoning behind the model’s release under the RSP, the statement refers the reader beyond the release artefact to the governance framework that authorised the release reasoning.[25] The system card’s public language is then anchored in a prior policy order. It is still a public-facing document with all the compression such documents require, but its authority depends on a surrounding archive of reports, thresholds, review duties, and redaction rules.

Anthropic’s disclosure structure is unusually explicit. Its RSP materials describe public Risk Reports, external-review commentary, internal transparency, anonymous noncompliance reporting, and third-party review of procedural compliance.[26] The update page also notes that the Long-Term Benefit Trust gained a stronger role in requesting external review, approving reviewer selection, and receiving regular briefings in version 3.2.[27] These details are not ornamental. They specify the paths through which disagreement, redaction, internal dissent, and review are expected to move. Public accountability here does not arise only from the release of a single final document. It arises from the arrangement of routes by which a document may be challenged before and after publication.

Yet the RSP also exposes the difficulty that all frontier frameworks face. Anthropic acknowledges that confidently ruling out some capability thresholds has become increasingly difficult and can require assessments more subjective than the company would like.[28] This admission is analytically valuable because it prevents the policy from pretending that threshold governance can fully mechanise safety judgement. The firmer point lies elsewhere: subjectivity is drawn into a documented process. The framework requires reports, approvals, external commentary, and transparency mechanisms precisely because judgement remains unstable at the frontier. Anthropic’s most important contribution may therefore be the insistence that uncertainty itself must acquire a public administrative surface.

4. OpenAI: Thresholds, Safeguards, and Residual-Risk Acceptance

OpenAI’s Preparedness Framework is built around a different documentary pressure. It begins from the possibility of severe harm, defined in terms of damage at the scale of thousands of deaths or hundreds of billions of dollars, and identifies tracked capability categories whose emergence would require structured evaluation, safeguard assessment, and governance review.[29] Version 2 of the framework focuses on biological and chemical capability, cybersecurity, and AI self-improvement as tracked categories, while retaining research categories for risks such as long-range autonomy, sandbagging, autonomous replication and adaptation, undermining safeguards, and nuclear or radiological threats.[30]This architecture makes the framework more explicitly thresholded than Anthropic’s recent report-driven form.

The central movement in OpenAI’s framework runs from capability report to safeguards report. A capability report assesses whether a model has reached threshold-relevant capacities in a tracked category. A safeguards report maps severe-harm pathways to controls, identifies safeguard efficacy thresholds, records evidence about those safeguards, and assesses residual risk. The framework distinguishes between safeguards against malicious users and safeguards against a misaligned model, since misuse and model-driven harm require different forms of control. Models reaching High capability require safeguards that sufficiently minimise severe-harm risk before deployment; models reaching Critical capability require safeguards during development as well.[31]

This structure makes evaluation consequential by placing it beside a second evidentiary question. The framework is concerned with what the model can do and with whether the proposed safeguards change the risk enough for the intended deployment. Evaluation of capability and evaluation of mitigation are therefore held in relation. A model may cross a threshold, yet the deployment question remains open until the organisation has assessed whether controls, access restrictions, monitoring, security measures, or architectural safeguards sufficiently reduce the relevant risk. The public claim that a model was deployed with safeguards carries weight only if the reader can reconstruct this relation.

OpenAI’s governance structure is advisory and executive. The Safety Advisory Group oversees the framework and assesses residual risk net of safeguards for covered launches, but leadership retains final go/no-go authority and may accept residual risk. The Board’s Safety and Security Committee receives oversight visibility.[32] This arrangement is important because it makes visible a point often hidden in public discourse around AI safety: the framework does not decide by itself. It produces a structured recommendation and a record of residual risk, while organisational leadership retains the authority to accept or reject that residual risk. The framework converts judgement into an accountable decision point, even as discretion remains.

Public disclosure occupies a parallel position. The Preparedness Framework states that public disclosures should include the scope of testing, evaluations in tracked categories, deployment reasoning, and, for models beyond a High threshold, safeguards applied. It also allows redaction or summarisation to protect intellectual property or safety.[33] The public document therefore sits between exposure and containment. It has to make enough of the reasoning visible to support accountability, while withholding information that could enable harm or compromise security. The balance cannot be solved once for all. It is one of the places where governance becomes editorial.

OpenAI’s Frontier Governance Framework intensifies this documentary transition by aligning internal frontier-safety practice with legal and systemic-risk obligations. The document situates OpenAI’s governance approach in relation to California’s Transparency in Frontier AI Act and the EU AI Act / Code of Practice, and it introduces Safety and Security Model Reports, update triggers, external-expert participation, and formal organisational responsibilities.[34] If the Preparedness Framework governs the movement from capability to safeguard to residual-risk acceptance, the Frontier Governance Framework gives that movement a more public legal-administrative form. It makes the release document part of a compliance and accountability ecology.

OpenAI’s system cards show the release-facing result of this structure. In public documentation around GPT-4o and subsequent Deployment Safety Hub materials, preparedness assessments appear alongside model capability, limitation, and safeguard language.[35] The system card compresses the internal file into a publicly legible account of capability level, risk category, mitigation, and deployment reasoning. The risk of this compression is obvious: the reader may see the outcome without seeing the full evidentiary route. The value of the framework lies in making the route nameable. One can ask which capability report, which safeguards report, which residual-risk assessment, and which governance approval sits behind the released claim.

5. NIST and the Standardising Grammar of Governance

NIST’s AI Risk Management Framework plays a decisive role in the documentary background of frontier AI governance, although its function differs from a frontier-lab deployment policy. It gives the field a public-sector grammar through which organisational risk work can be described, assessed, and communicated. AI RMF 1.0 defines a voluntary, rights-preserving, non-sector-specific, and use-case-agnostic structure organised through four functions: Govern, Map, Measure, and Manage.[36] These functions define how risk work should become an organisational process, leaving model-specific release gates to company frameworks or later regulatory instruments.

The difference is substantial. OpenAI, Anthropic, and Google DeepMind ask what to do with models whose capabilities may cross severe-risk thresholds. NIST asks how an organisation should establish risk-management roles, map context and impacts, measure trustworthiness and risks, manage treatment decisions, and maintain communication over time. It leaves risk tolerance to organisational and contextual determination, placing responsibility on organisations to define context-appropriate tolerances and document how risks will be prioritised, treated, monitored, and communicated.[37] That refusal to set a universal threshold reflects the framework’s function as a reference architecture rather than a release gate.

The Generative AI Profile brings NIST closer to the documentary concerns of frontier-model release. It identifies several high-priority actions, including Governance, Content Provenance, Pre-deployment Testing, and Incident Disclosure, and it recommends practices such as documenting trade-offs in release decisions, retaining testing and evaluation records, tracking provenance, and maintaining incident response and disclosure processes.[38] These recommendations make the public surface of AI safety broader than the question of whether a model reached a dangerous capability threshold. They include the record of how outputs are labelled, how test histories are preserved, how incidents are handled, and how release decisions are justified to relevant audiences.

NIST’s importance therefore lies in the way it turns accountability into documentation without narrowing accountability to model release. Governance is not a single committee or a final approval memo. It appears as the durable arrangement of roles, policies, feedback loops, metrics, communication practices, and documented risk treatments. When a frontier lab cites NIST or adopts NIST-like language, it draws its own internal practice into a wider standardising horizon. The claim can then be measured against a vocabulary already circulating across public administration, industry compliance, and risk-management practice.

OpenAI’s Frontier Governance Framework explicitly states that its approach is informed by the NIST AI RMF, which makes the relation between public standard and company framework concrete.[39] The borrowing is not merely terminological. NIST’s emphasis on governance, measurement, management, documentation, and communication allows OpenAI’s model reports and systemic-risk procedures to enter a broader field of recognised risk practice. Similar pressure operates on other frameworks even when the citation is less direct, because the concepts of risk treatment, tolerance, provenance, incident disclosure, and organisational accountability now travel widely across the AI governance field.

NIST also clarifies an important distinction for public readers. A frontier safety framework may require a particular safeguard when a model reaches a specified capability. A risk management framework may require an organisation to define who owns risk, how decisions are documented, how trade-offs are evaluated, and how incidents will be communicated. The two forms overlap, yet they answer different questions. Confusing them leads to bad public criticism. A reader who expects NIST to function like OpenAI’s Preparedness Framework will find it under-specified. A reader who expects OpenAI’s Preparedness Framework to provide the broad organisational grammar of NIST will miss the narrower severe-harm logic that gives OpenAI’s policy its force.

For the purposes of AI safety documentation, NIST stabilises the background against which public claims can be read. It gives language to the processes that sit around evaluation: governance, mapping, measurement, risk treatment, communication plans, provenance, retention, and disclosure. These terms are not decorative. They are the means by which a claim about a frontier model becomes part of an auditable organisational memory. A public system card without such memory may still be informative; a system card attached to a documented risk-management process is easier to interrogate, compare, and hold open to later correction.

6. Google DeepMind: Residual Risk and Safety-Case Supplementation

Google DeepMind’s Frontier Safety Framework is the most formalised of the company frameworks in risk-management vocabulary. Version 3.1 defines Tracked Capability Levels and Critical Capability Levels, applies them across misuse risk, machine-learning research-and-development risk, and misalignment risk, and organises assessment through a sequence that moves from risk identification to inherent-risk assessment, mitigation, residual-risk assessment, and risk-acceptance determination.[40] The framework therefore begins from a classical risk-management distinction: the system’s inherent risk before mitigation differs from the risk left after safeguards and controls have been applied.

This distinction matters because it makes deployment depend on residual risk rather than raw capability alone. A model’s capability may trigger warning, assessment, or response planning, but the deployment question is framed through the risk that remains after security mitigations, deployment mitigations, and other controls have been applied. When a model reaches an alert threshold for a Critical Capability Level, Google assesses proximity and risk, develops a response plan, and involves internal or external experts where needed. When the model has reached a CCL, residual-risk assessment is supplemented with a safety case; external deployment and high-risk internal deployment require the appropriate governance function to determine that residual risk is acceptable.[41]

The term ‘acceptable’ carries the weight of the framework. It cannot be reduced to a technical test result. It names a judgement made after evaluation, mitigation, and review. Google DeepMind’s framework narrows the space of judgement by specifying stages, triggers, evidence types, and governance functions, yet it leaves the final acceptability criterion partly embedded in organisational decision-making. That wording registers a structural condition: at the frontier, where threat models remain uncertain and capabilities may be elicited imperfectly, some judgement will remain institutional. The framework makes that judgement locatable.

Safety cases give this locatability a more formal argumentative structure. The UK AI Security Institute has described safety cases as structured arguments that a system is safe in a specified context, written by a proponent, challenged by red-team review, and used by a decision-maker as part of a larger deployment judgement.[42] Google DeepMind’s use of safety cases at CCLs therefore imports a mode of evidentiary argument into frontier AI governance. The safety case functions as a contestable argument whose authority depends on the relation among claim, evidence, context, and review before the decision proceeds.

The public side of Google’s framework is more selective than Anthropic’s. The FSF states that Google aims to share relevant model information, evaluation results, and mitigation plans with appropriate government authorities, and where appropriate other organisations, when a model reaches a CCL that poses an unmitigated and material public-safety risk.[43]This produces a form of accountability partly oriented toward state and institutional recipients, with the general public receiving a more compressed version through model cards and reports. Its public model cards and FSF reports provide release-facing summaries, while more sensitive information may remain inside confidential disclosure channels.

The Gemini documentation illustrates this layered disclosure. The Gemini 3 Pro Frontier Safety Framework Report documents evaluations and mitigations under the FSF, while the Gemini 3.1 Pro Model Card explains that the model was assessed following FSF protocols and points readers toward the FSF report for details.[44] The model card’s language is compressed and public-facing; the FSF report offers a more specialised account of the frontier-safety assessment; the framework itself defines the governance procedure. This triadic arrangement is one of the clearest examples of how risk governance migrates into public claim through several registers of documentation.

Google DeepMind’s FSF is therefore valuable for this article because it foregrounds the distinction between evaluation and acceptance. Evaluations provide early warnings, capability estimates, and evidence for risk assessment. Mitigations attempt to reduce the risk. Residual-risk analysis assesses what remains. A safety case supplies a structured argument where the stakes are highest. A governance function accepts, rejects, or conditions the result. Public documentation then compresses these movements into a claim that readers may encounter in a model card or report. The framework’s seriousness lies in that ordered movement. Its vulnerability lies in the fact that the acceptability of residual risk remains only partially visible from outside.

7. From Evaluation Evidence to Deployment Decision

Across these frameworks, evaluation evidence acquires consequence only when placed inside a documentary form that can carry it into decision. A capability evaluation may establish that a model performs at a certain level on a benchmark, completes a task, assists a user in a sensitive domain, or exhibits behaviours relevant to autonomy, cyber operations, biological reasoning, or model self-improvement. The result remains incomplete as governance evidence until it is connected to a threat model, a threshold, a safeguard, and a decision procedure. The same score can carry different institutional force depending on the framework into which it is placed.

OpenAI makes this relation explicit by distinguishing capability reports from safeguards reports. The former assess whether a model has reached a tracked severe-harm threshold; the latter assess whether proposed safeguards sufficiently reduce the risk associated with deployment or development. The Safety Advisory Group then assesses residual risk net of safeguards before leadership makes the final go/no-go decision.[45]The structure is important because it prevents capability evidence from speaking in isolation. A model’s dangerous capability may be unacceptable in one access context and manageable in another if the safeguards, monitoring, or restrictions differ. The framework’s task is to record the reasoning that links the capability to the deployment conditions.

Anthropic’s Risk Reports produce a similar transformation through a different form. Instead of making a single launch document bear the full weight of safety judgement, Anthropic requires recurring reports that gather threat models, evaluation evidence, mitigations, risk assessments, and risk-benefit determinations across deployed and internal models.[46]This temporal structure matters. A deployment decision remains attached to subsequent risk posture. The organisation must return to its models, assess risk under changed conditions, and make the assessment public in periodic form. Evaluation therefore becomes part of a repeated documentary cycle.

Google DeepMind’s framework places evaluation inside the movement from inherent risk to residual risk. Early-warning evaluations and alert thresholds identify proximity to capability levels; risk assessment evaluates severity and likelihood; mitigations alter the risk; residual-risk assessment and, at CCLs, safety cases support the deployment decision.[47] The sequence has an important consequence. The model’s evaluation record becomes meaningful as a public safety claim only after the organisation specifies how that evaluation interacted with mitigation and how the remaining risk was accepted.

NIST widens the evidentiary field. Its Generative AI Profile treats pre-deployment testing, provenance, retention of testing and evaluation histories, incident disclosure, and documentation of release trade-offs as part of responsible AI risk management.[48]Evidence then includes more than model behaviour under benchmark conditions. It includes records of how the model was tested, how decisions were made, how incidents are handled, how content provenance is maintained, and how risk treatment is communicated. This broader view is valuable because public accountability often fails at the edges of the benchmark, where provenance, process, and incident response determine whether a claim remains durable.

The movement from evaluation to decision therefore involves at least four transformations. Capability evidence is first translated into risk relevance through a threat model. Risk relevance is then measured against a threshold or tolerance. The threshold result is placed in relation to safeguards and mitigations. The mitigated case is finally routed through a governance body that accepts, rejects, conditions, or escalates the decision. Each transformation can introduce slippage. The threat model may be incomplete; capability elicitation may understate real-world capability; safeguards may work only under test conditions; residual risk may be accepted without enough external visibility.

The frameworks acknowledge some of these limits. OpenAI’s Frontier Governance Framework treats one-time elicitation as a lower bound on real-world capability.[49] Google DeepMind supplements evaluations with model-independent information and post-market monitoring.[50] Anthropic’s public update page records that confidently ruling out some thresholds has become increasingly difficult and can involve assessments more subjective than the company would prefer.[51] These acknowledgements are not incidental. They mark the frontier condition under which governance documentation now operates. The decision must be made before epistemic closure is available.

This is why safety cases have become conceptually important. AISI’s safety-case work places the argument itself under scrutiny: a writer produces a structured case, a red team critiques it, and a decision-maker relies on it while remaining responsible for the decision.[52] That model does not dissolve uncertainty. It places uncertainty inside an argument that can be challenged. In AI safety documentation, this shift is crucial. A good public artefact should not make uncertainty disappear through confident prose; it should show how uncertainty affected the judgement, what evidence was available, what safeguards were relied upon, and what residual risk was left to institutional acceptance.

8. From Deployment Decision to Public Claim

A deployment decision becomes a public claim when the organisation gives the decision documentary form. That form may be a system card, a model card, a safety report, a risk report, a model report, a transparency update, or a government-facing disclosure. Each genre carries a different promise. A system card explains a particular release. A model card often presents capabilities, limitations, mitigations, and intended use. A risk report can extend beyond a single release and describe an organisation’s risk posture across models. A model report, in the regulatory sense emerging in OpenAI’s Frontier Governance Framework, can connect safety and security information to legal or systemic-risk obligations.[53]

Anthropic’s documentation makes this relation especially explicit. Its RSP explains that Risk Reports share significant content with system cards but add structure and process for overall risk assessment.[54] The February 2026 Risk Report states that system cards are published for each model release and analyse some risk dimensions, while Risk Reports provide cross-model assessment of mitigations and overall risk.[55] Claude Sonnet 4.6’s system card then describes the reasoning behind release under the RSP.[56] These three documents form a chain. The framework defines obligations; the system card explains release reasoning; the risk report re-anchors the release within a broader view of risk.

OpenAI’s public documentation follows a related route. The Preparedness Framework defines tracked categories, thresholds, safeguards, reporting, SAG review, and public disclosure. System cards and deployment safety materials then communicate the result of preparedness assessment in release-facing terms.[57] When a system card or deployment report states that a model reached a particular capability level or that safeguards were applied, the public claim compresses the internal sequence of capability testing, safeguard assessment, residual-risk judgement, and leadership decision. The compression is necessary for public readability. It is also the point at which overclaiming becomes possible.

Google DeepMind’s release documents show a different compression. The FSF defines the governance process; model-specific FSF reports describe how the process was applied; public model cards summarise the outcome for users and developers. The Gemini 3.1 Pro Model Card’s reference to FSF protocols gives the public reader a doorway into the larger governance archive, while the Gemini 3 Pro FSF Report carries more detailed evaluation and mitigation reasoning.[58] The model card functions as the public entrance to a larger document family.

NIST contributes to this public movement through a more architectural route. It makes clear that good public documentation must carry more than release confidence. Provenance, incident response, retention of test and evaluation histories, risk communication, and documented release trade-offs all become part of the accountability record.[59] A model-release document that reports only a final safety judgement may appear clear, yet its clarity can become brittle if the reader cannot understand what evidence was retained, how incidents will be disclosed, or how future monitoring will revise the claim.

The public claim is therefore a translation of the internal decision. Details move through different routes: some are selected for public release, some are summarised or redacted, and some are routed to government authorities or external reviewers rather than to the general public. Anthropic explicitly allows redactions while asking external reviewers to assess whether redactions are material to disagreement.[60] OpenAI allows public disclosures to be redacted or summarised for intellectual-property or safety reasons.[61]Google DeepMind allows certain public-safety-relevant information to be shared with government authorities or other organisations under appropriate constraints.[62] These disclosure rules are part of the governance system. They determine the public shape of accountability.

This selective translation creates a recurring danger. Safety language can become smoother than the evidence it represents. A release document may say that a model was evaluated, that safeguards were applied, that residual risk was judged acceptable, or that a model remained below a threshold. Each phrase is defensible only if it keeps contact with the process behind it. What was tested? Which threat model structured the test? What was the threshold? Which safeguards were considered? What did mitigation change? Who accepted the residual risk? What remains uncertain? A system card that permits these questions to be asked is doing accountability work. A system card that forecloses them becomes a polished claim.

The public form of AI safety is therefore editorial in the strongest sense. It is the place where institutional judgement becomes language. The writer of such a document preserves relations among evidence, uncertainty, mitigation, governance, and public responsibility under pressure from readability, speed, legal review, security concerns, and organisational interest. AI safety documentation therefore deserves analysis as a serious governance practice embedded in model development.

9. Discretion, Ambiguity, and the Politics of Acceptable Risk

The reviewed frameworks make frontier-AI governance more legible while leaving deployment short of mechanical determination. Their common accomplishment is procedural. They define risk domains, thresholds, reporting routines, review roles, safeguard expectations, and disclosure practices. Their common difficulty is that residual-risk acceptance remains a judgement made under uncertainty. The phrase ‘acceptable risk’ marks the point where technical evaluation, organisational responsibility, and public trust converge without fully stabilising.

Google DeepMind uses acceptable residual risk as a condition for external deployment or high-risk internal deployment when a Critical Capability Level has been reached.[63]OpenAI places residual-risk assessment before the Safety Advisory Group and then gives leadership final authority to accept residual risk.[64] Anthropic requires Risk Reports with threat-specific and overall risk assessments, as well as risk-benefit determinations approved through named governance channels.[65] NIST’s AI RMF makes the contextual nature of risk tolerance explicit by declining to prescribe a universal tolerance and requiring organisations to determine their own levels in context.[66] In every case, the framework narrows judgement without replacing it.

This residual discretion is more than a simple defect. Some discretion is unavoidable because frontier risks are partially hypothetical, capability elicitation is imperfect, threat models evolve, and deployment conditions differ. The more important question is whether discretion is documented. A framework can make discretion more accountable by specifying who exercises it, what evidence they must consider, what safeguards they must assess, which external parties may challenge the reasoning, and how the resulting decision becomes public. The absence of mechanical determination then becomes a reason for stronger documentary practice, not a reason to abandon framework governance.

Ambiguity also appears in the boundary between internal and public records. Anthropic’s public posture is comparatively expansive, yet redaction remains part of its system. OpenAI’s framework promises public disclosure of deployment reasoning and safeguards while allowing summary or redaction. Google DeepMind’s framework gives some disclosure obligations a state-facing form. NIST treats incident disclosure, provenance, and communication plans as organisational processes whose public visibility will vary across context.[67] The public reader encounters the visible surface of a larger file. Accountability depends on whether the hidden portions are governed by credible review, record-keeping, and challenge mechanisms.

A further ambiguity lies in the relation between policy commitment and operational safeguard. A framework may state that a model crossing a threshold will require stronger safeguards, limited deployment, or non-deployment until risk is reduced. The strength of that commitment depends on how thresholds are interpreted, how safeguards are tested, and how exceptions or updates are handled. Anthropic’s version history demonstrates that thresholds and policy language can change as the organisation learns.[68] OpenAI’s distinction between tracked categories and research categories shows that some domains are considered operationally mature while others remain under development.[69] Google DeepMind’s annual review and update commitments acknowledge that the framework itself must respond to new evidence.[70]

This evolution is valuable, but it also complicates public accountability. A living framework can improve; it can also move the boundary of responsibility. The public document therefore has to preserve version history, state changes clearly, and explain why a revised threshold or safeguard better captures the risk. Anthropic’s update page is useful precisely because it records changes in policy rather than treating each version as self-evident. A future governance practice in which frameworks update without public change logs would weaken the memory on which accountability depends.

External review helps, yet it cannot carry the whole burden. Anthropic’s external-review commentary, OpenAI’s third-party model evaluation and safeguard stress-testing provisions, and Google DeepMind’s involvement of external experts where needed all create channels of contestability.[71] Still, the independence, scope, timing, and public visibility of such review vary. External review can challenge a safety case; it can also become a narrow procedural check if it lacks access, authority, or publication rights. The public document should therefore specify what kind of review occurred and how any disagreement was handled. A vague statement that experts were consulted gives less accountability than a note explaining the scope of review, the evidence reviewed, the redactions applied, and the relation between reviewer disagreement and final decision.

The politics of acceptable risk finally returns to language. Public documentation can make residual risk sound small by placing it behind polished phrases such as ‘sufficiently mitigated’, ‘below threshold’, ‘appropriate safeguards’, or ‘acceptable residual risk’. These phrases may be accurate. They become weak when they detach from the documentary route that produced them. A stronger public claim would specify the category of risk, the evaluation performed, the safeguard relation, the residual uncertainty, the governance actor who accepted the risk, and the disclosure limits that shape what the reader can see. The aim concerns a public form that can protect sensitive details without asking trust to do the work of evidence.

10. Conclusion: Organised Responsibility and the Public Form of AI Safety

Frontier-AI governance frameworks have become central to the public life of AI safety because they organise the movement from risk to claim. They give public form to the classification of danger, the evaluation of models, the interpretation of thresholds, the attachment of safeguards, the assessment of residual risk, the approval of deployment, and the selective disclosure of reasoning. Their importance lies in the way they give institutional form to judgement before a system card, model card, or risk report presents that judgement to readers.

Anthropic, OpenAI, Google DeepMind, and NIST each arrange this movement differently. Anthropic gives the strongest public documentary architecture through Risk Reports, Roadmaps, version histories, external-review commentary, and explicit approval structures. OpenAI gives a sharper threshold-and-safeguard mechanism, with Capabilities Reports, Safeguards Reports, SAG review, leadership acceptance of residual risk, and a newer Frontier Governance Framework that aligns those practices with emerging legal obligations. Google DeepMind gives a formal risk-management sequence in which early-warning evaluations, residual-risk assessments, safety cases, and governance functions carry the weight of deployment review. NIST supplies the wider grammar through which governance, mapping, measurement, management, provenance, testing, risk treatment, and disclosure become organisational obligations.

The frameworks converge around a basic proposition: evaluations require documentation before they can become accountable decisions. A benchmark or red-team finding cannot speak for itself in public safety language. It has to be placed in relation to a threat model, a threshold, a safeguard, a residual-risk assessment, and a decision-maker. Once that relation is written down, the public claim begins to acquire form. It can be checked, challenged, compared, revised, and remembered. Where the relation is missing, the claim may remain fluent while its evidentiary base becomes inaccessible.

The unresolved difficulty is discretion. The frameworks make judgement more legible while leaving judgement in place. Organisations decide which risks to prioritise, which evidence to collect, which thresholds to revise, which safeguards count as sufficient, which residual risks to accept, and which parts of the reasoning to publish. This discretion can be abused, but it can also be governed. The most serious frontier-AI documentation will therefore be judged by how it handles discretion: whether it hides judgement behind technical language, or whether it gives judgement a traceable form through roles, reports, review, version history, and disclosure.

A public safety document should allow the reader to see how the claim has been made durable. It should show how evaluation evidence was translated into risk reasoning, how mitigation altered the case, how residual uncertainty survived the process, and how institutional authority was exercised. Such documentation will never remove the need for trust. It can, however, change the terms of trust. It can make trust answerable to evidence, procedure, and public memory. At the frontier of AI deployment, that may be one of the most practical forms accountability can take.

Notes

[1] HM Government, ‘Frontier AI Safety Commitments, AI Seoul Summit 2024’ (GOV.UK, 21 May 2024, updated 7 February 2025), https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-summit-2024/frontier-ai-safety-commitments-ai-seoul-summit-2024.

[2] Anthropic, ‘Anthropic’s Responsible Scaling Policy’ (last updated 26 May 2026), https://www.anthropic.com/responsible-scaling-policy; Anthropic, Responsible Scaling Policy, version 3.1 (2 April 2026), pp. 2, 10-14, https://www-cdn.anthropic.com/files/4zrzovbb/website/bf04581e4f329735fd90634f6a1962c13c0bd351.pdf.

[3] OpenAI, Preparedness Framework, version 2 (15 April 2025), pp. 1, 4-15, https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf; OpenAI, OpenAI Frontier Governance Framework (May 2026), pp. 1-2, 16-18, https://cdn.openai.com/pdf/e37d949b-8c9f-4d76-b99e-4272f4631a7e/openai-frontier-governance-framework.pdf.

[4] Google DeepMind, Frontier Safety Framework 3.1 (17 April 2026), pp. 1, 5-8, 16-17, https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/strengthening-our-frontier-safety-framework/frontier-safety-framework_3-1.pdf.

[5] National Institute of Standards and Technology, Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1 (January 2023), pp. 20-32, https://doi.org/10.6028/NIST.AI.100-1, https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf; National Institute of Standards and Technology, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1 (July 2024), pp. 1-2, 12-46, https://doi.org/10.6028/NIST.AI.600-1, https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf.

[6] METR, ‘Responsible Scaling Policies’ (26 September 2023), https://metr.org/blog/2023-09-26-rsp/.

[7] UK Government, Emerging Processes for Frontier AI Safety (Department for Science, Innovation and Technology, 2023), pp. 12-15, https://assets.publishing.service.gov.uk/media/653aabbd80884d000df71bdc/emerging-processes-frontier-ai-safety.pdf.

[8] Marie Davidsen Buhl, Ben Bucknall, and Tammy Masterson, ‘Emerging Practices in Frontier AI Safety Frameworks’, arXiv:2503.04746 (2025), pp. 1-5, https://doi.org/10.48550/arXiv.2503.04746, https://arxiv.org/abs/2503.04746.

[9] OpenAI, Preparedness Framework, v2, pp. 4-8.

[10] Google DeepMind, Frontier Safety Framework 3.1, pp. 1, 11-15.

[11] Anthropic, Responsible Scaling Policy, v3.1, pp. 2-8; Anthropic, ‘Responsible Scaling Policy’.

[12] NIST, AI RMF 1.0, pp. 20-32.

[13] HM Government, ‘Frontier AI Safety Commitments’.

[14] OpenAI, Preparedness Framework, v2, pp. 4-6, 10-12.

[15] Google DeepMind, Frontier Safety Framework 3.1, pp. 5-8.

[16] Anthropic, ‘Responsible Scaling Policy’.

[17] Anthropic, Responsible Scaling Policy, v3.1, pp. 10-14; Anthropic, ‘Responsible Scaling Policy’.

[18] OpenAI, Preparedness Framework, v2, pp. 12-13; OpenAI, Frontier Governance Framework, pp. 16-18.

[19] Google DeepMind, Frontier Safety Framework 3.1, pp. 16-17.

[20] NIST, Generative AI Profile, pp. 1-2, 12-46.

[21] Anthropic, ‘Responsible Scaling Policy’; Anthropic, Responsible Scaling Policy, v3.1, pp. 2, 10-14.

[22] Anthropic, Responsible Scaling Policy, v3.1, pp. 3-4; Anthropic, ‘Responsible Scaling Policy’.

[23] Anthropic, Responsible Scaling Policy, v3.1, pp. 10-12.

[24] Anthropic, Risk Report: February 2026 (February 2026), https://www-cdn.anthropic.com/08eca2757081e850ed2ad490e5253e940240ca4f.pdf.

[25] Anthropic, Claude Sonnet 4.6 System Card (17 February 2026), https://www.anthropic.com/claude-sonnet-4-6-system-card.

[26] Anthropic, Responsible Scaling Policy, v3.1, pp. 12-14.

[27] Anthropic, ‘Responsible Scaling Policy’.

[28] Anthropic, ‘Responsible Scaling Policy’.

[29] OpenAI, Preparedness Framework, v2, p. 1.

[30] OpenAI, Preparedness Framework, v2, pp. 4-8.

[31] OpenAI, Preparedness Framework, v2, pp. 10-12, 16-20.

[32] OpenAI, Preparedness Framework, v2, pp. 12, 15.

[33] OpenAI, Preparedness Framework, v2, pp. 12-13.

[34] OpenAI, Frontier Governance Framework, pp. 1-2, 16-18.

[35] OpenAI, GPT-4o System Card (2024), https://openai.com/index/gpt-4o-system-card/; OpenAI, GPT-5.5 System Card (OpenAI Deployment Safety, 2026), https://deploymentsafety.openai.com/gpt-5-5/gpt-5-5.pdf.

[36] NIST, AI RMF 1.0, pp. 20-32.

[37] NIST, AI RMF 1.0, pp. 7-8, 20-32.

[38] NIST, Generative AI Profile, pp. 1-2, 12-46.

[39] OpenAI, Frontier Governance Framework, pp. 1-2.

[40] Google DeepMind, Frontier Safety Framework 3.1, pp. 1, 4-8.

[41] Google DeepMind, Frontier Safety Framework 3.1, pp. 6-8.

[42] Geoffrey Irving, ‘Safety Cases at AISI’ (AI Security Institute, 23 August 2024), https://www.aisi.gov.uk/blog/safety-cases-at-aisi; Benjamin Hilton, Marie Davidsen Buhl, Tomek Korbak, and Geoffrey Irving, ‘Safety Cases: A Scalable Approach to Frontier AI Safety’, arXiv:2503.04744 (2025), pp. 1-4, https://doi.org/10.48550/arXiv.2503.04744, https://arxiv.org/abs/2503.04744.

[43] Google DeepMind, Frontier Safety Framework 3.1, pp. 16-17.

[44] Google DeepMind, Gemini 3 Pro Frontier Safety Framework Report (2025), https://storage.googleapis.com/deepmind-media/gemini/gemini_3_pro_fsf_report.pdf; Google DeepMind, Gemini 3.1 Pro Model Card (2026), https://deepmind.google/models/model-cards/gemini-3-1-pro/.

[45] OpenAI, Preparedness Framework, v2, pp. 10-12, 15.

[46] Anthropic, Responsible Scaling Policy, v3.1, pp. 10-12.

[47] Google DeepMind, Frontier Safety Framework 3.1, pp. 5-8.

[48] NIST, Generative AI Profile, pp. 12-46.

[49] OpenAI, Preparedness Framework, v2, pp. 8-9; OpenAI, Frontier Governance Framework, pp. 5-6.

[50] Google DeepMind, Frontier Safety Framework 3.1, pp. 6-8, 17.

[51] Anthropic, ‘Responsible Scaling Policy’.

[52] Irving, ‘Safety Cases at AISI’; Hilton and others, ‘Safety Cases’, pp. 1-4.

[53] OpenAI, Frontier Governance Framework, pp. 16-18; Anthropic, Risk Report: February 2026; Google DeepMind, Gemini 3.1 Pro Model Card.

[54] Anthropic, Responsible Scaling Policy, v3.1, pp. 10-11.

[55] Anthropic, Risk Report: February 2026.

[56] Anthropic, Claude Sonnet 4.6 System Card.

[57] OpenAI, Preparedness Framework, v2, pp. 12-13; OpenAI, GPT-4o System Card; OpenAI, GPT-5.5 System Card.

[58] Google DeepMind, Gemini 3 Pro Frontier Safety Framework Report; Google DeepMind, Gemini 3.1 Pro Model Card.

[59] NIST, Generative AI Profile, pp. 12-46.

[60] Anthropic, Responsible Scaling Policy, v3.1, pp. 13-14.

[61] OpenAI, Preparedness Framework, v2, pp. 12-13.

[62] Google DeepMind, Frontier Safety Framework 3.1, pp. 16-17.

[63] Google DeepMind, Frontier Safety Framework 3.1, pp. 6-8.

[64] OpenAI, Preparedness Framework, v2, p. 15.

[65] Anthropic, Responsible Scaling Policy, v3.1, pp. 10-12.

[66] NIST, AI RMF 1.0, pp. 7-8, 20-32.

[67] Anthropic, Responsible Scaling Policy, v3.1, pp. 13-14; OpenAI, Preparedness Framework, v2, pp. 12-13; Google DeepMind, Frontier Safety Framework 3.1, pp. 16-17; NIST, Generative AI Profile, pp. 12-46.

[68] Anthropic, ‘Responsible Scaling Policy’.

[69] OpenAI, Preparedness Framework, v2, pp. 6-8.

[70] Google DeepMind, Frontier Safety Framework 3.1, pp. 17-18.

[71] Anthropic, Responsible Scaling Policy, v3.1, pp. 13-14; OpenAI, Frontier Governance Framework, pp. 17-18; Google DeepMind, Frontier Safety Framework 3.1, pp. 16-17.

Bibliography

Anthropic, ‘Anthropic’s Responsible Scaling Policy’, Anthropic, last updated 26 May 2026, https://www.anthropic.com/responsible-scaling-policy.

Anthropic, Claude Sonnet 4.6 System Card, Anthropic, 17 February 2026, https://www.anthropic.com/claude-sonnet-4-6-system-card.

Anthropic, Responsible Scaling Policy, version 3.1, Anthropic, 2 April 2026, https://www-cdn.anthropic.com/files/4zrzovbb/website/bf04581e4f329735fd90634f6a1962c13c0bd351.pdf.

Anthropic, Risk Report: February 2026, Anthropic, February 2026, https://www-cdn.anthropic.com/08eca2757081e850ed2ad490e5253e940240ca4f.pdf.

Buhl, Marie Davidsen, Ben Bucknall, and Tammy Masterson, ‘Emerging Practices in Frontier AI Safety Frameworks’, arXiv:2503.04746 (2025), https://doi.org/10.48550/arXiv.2503.04746, https://arxiv.org/abs/2503.04746.

Clymer, Joshua, Jonah Weinbaum, Robert Kirk, Kimberly Mai, Selena Zhang, and Xander Davies, An Example Safety Case for Safeguards against Misuse, AI Security Institute, 5 June 2025, https://www.aisi.gov.uk/research/an-example-safety-case-for-safeguards-against-misuse.

Coggins, Sam, Alex Saeri, Katherine A. Daniell, Lorenn P. Ruster, Jessie Liu, and Jenny L. Davis, ‘The 2025 OpenAI Preparedness Framework Does Not Guarantee Any AI Risk Mitigation Practices: A Proof-of-Concept for Affordance Analyses of AI Safety Policies’, arXiv:2509.24394 (2025), https://arxiv.org/abs/2509.24394.

Google DeepMind, Frontier Safety Framework 3.1, Google DeepMind, 17 April 2026, https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/strengthening-our-frontier-safety-framework/frontier-safety-framework_3-1.pdf.

Google DeepMind, Gemini 3 Pro Frontier Safety Framework Report, Google DeepMind, 2025, https://storage.googleapis.com/deepmind-media/gemini/gemini_3_pro_fsf_report.pdf.

Google DeepMind, Gemini 3 Pro Model Card, Google DeepMind, 2025, https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf.

Google DeepMind, Gemini 3.1 Pro Model Card, Google DeepMind, 2026, https://deepmind.google/models/model-cards/gemini-3-1-pro/.

Hilton, Benjamin, Marie Davidsen Buhl, Tomek Korbak, and Geoffrey Irving, ‘Safety Cases: A Scalable Approach to Frontier AI Safety’, arXiv:2503.04744 (2025), https://doi.org/10.48550/arXiv.2503.04744, https://arxiv.org/abs/2503.04744.

HM Government, ‘Frontier AI Safety Commitments, AI Seoul Summit 2024’, GOV.UK, 21 May 2024, updated 7 February 2025, https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-summit-2024/frontier-ai-safety-commitments-ai-seoul-summit-2024.

Irving, Geoffrey, ‘Safety Cases at AISI’, AI Security Institute, 23 August 2024, https://www.aisi.gov.uk/blog/safety-cases-at-aisi.

METR, ‘Responsible Scaling Policies’, METR Blog, 26 September 2023, https://metr.org/blog/2023-09-26-rsp/.

National Institute of Standards and Technology, Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1 (Gaithersburg, MD: National Institute of Standards and Technology, January 2023), https://doi.org/10.6028/NIST.AI.100-1, https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf.

National Institute of Standards and Technology, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1 (Gaithersburg, MD: National Institute of Standards and Technology, July 2024), https://doi.org/10.6028/NIST.AI.600-1, https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf.

OpenAI, GPT-4o System Card, OpenAI, 2024, https://openai.com/index/gpt-4o-system-card/.

OpenAI, GPT-5.5 System Card, OpenAI Deployment Safety, 2026, https://deploymentsafety.openai.com/gpt-5-5/gpt-5-5.pdf.

OpenAI, OpenAI Frontier Governance Framework, OpenAI, May 2026, https://cdn.openai.com/pdf/e37d949b-8c9f-4d76-b99e-4272f4631a7e/openai-frontier-governance-framework.pdf.

OpenAI, Preparedness Framework, version 2, OpenAI, 15 April 2025, https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf.

UK Government, Emerging Processes for Frontier AI Safety, London: Department for Science, Innovation and Technology, 2023, https://assets.publishing.service.gov.uk/media/653aabbd80884d000df71bdc/emerging-processes-frontier-ai-safety.pdf.