Systemic Risk Classification
How GPAI models are classified as presenting systemic risk.
Systemic Risk Classification (Article 51)
Learning Objectives
By the end of this chapter, you will be able to:
- Define "systemic risk" under the AI Act framework
- Apply the computational threshold (10^25 FLOPS) to classify GPAI models
- Identify factors that may trigger Commission designation
- Execute notification obligations to the AI Office
- Monitor models for approaching systemic risk thresholds
- Understand the consequences of systemic risk classification
Article 51 introduces a critical distinction within the GPAI framework: models presenting "systemic risk" face significantly enhanced obligations. This tiered approach reflects the EU's recognition that the most capable AI models may pose risks at a societal scale.
Understanding Systemic Risk
Article 51(1) Classification Criteria
A GPAI model is classified as presenting systemic risk if it meets either of two conditions:
- Article 51(1)(a): It has high-impact capabilities evaluated on the basis of appropriate technical tools and methodologies, including indicators and benchmarks
- Article 51(1)(b): The Commission designates it as such, based on criteria equivalent to those set out in point (a), in accordance with Annex XIII, either ex officio or following a qualified alert from the scientific panel pursuant to Article 90(1)(a)
⚠️ Note: Article 51(1) itself does not define "systemic risk" by listing risk categories. The concept of high-impact capabilities and their potential effects is elaborated in Annex XIII and supporting Recitals, not in Article 51(1) directly.
Annex XIII criteria for assessing high-impact capabilities include:
| Criterion | Assessment Factors |
|---|---|
| Number of parameters | Model size and complexity |
| Quality and size of dataset | Breadth, depth, and curation of training data |
| Input/output modalities | Text, image, audio, video, code capabilities |
| Benchmarks and evaluations | Performance on capability evaluations |
| Reach and scale | Number of users, integrations, deployments |
| High-impact capabilities | Assessed dangerous or transformative capabilities |
"High-Impact Capabilities" Interpretation
| Indicator | Assessment Criteria |
|---|---|
| Scale of deployment | Millions of users, broad integration |
| Capability breadth | Wide range of sophisticated tasks |
| Reasoning ability | Complex multi-step reasoning |
| Autonomous action | Ability to act with minimal human direction |
| Knowledge synthesis | Combining information across domains |
| Code generation | Creating functional software, exploits |
| Persuasion | Sophisticated content generation |
| Multimodality | Integration of text, image, audio, video |
The 10^25 FLOPS Threshold (Article 51(2))
Computational Threshold Rule
Article 51(2) establishes: A GPAI model shall be presumed to have high-impact capabilities when the cumulative amount of computation used for its training measured in floating point operations (FLOPs) is greater than 10^25.
Understanding the Threshold
| Scale Reference | FLOPS | Status |
|---|---|---|
| Small models (7B parameters) | ~10^22 | Well below threshold |
| Medium models (70B parameters) | ~10^23-10^24 | Below threshold |
| Large models (2024 frontier) | ~10^24-10^25 | Near threshold |
| Systemic risk threshold | 10^25 | Presumed systemic risk |
| Future frontier models | >10^25 | Clearly above threshold |
What Counts as "Training Compute"?
| Included | Not Included |
|---|---|
| Initial pre-training | Inference at deployment |
| Fine-tuning (if substantial) | Minor adaptation/prompting |
| RLHF training | Downstream provider fine-tuning |
| Multi-stage training | Evaluation and testing |
| Distillation (from scratch) | User interactions |
Calculating Training Compute
Standard Estimation Formula:
Training FLOPS ≈ 6 × Number of Parameters × Number of Training Tokens
Example Calculation:
- 70B parameter model trained on 2T tokens
- 6 × 70×10^9 × 2×10^12 = 8.4×10^23 FLOPS
- Below 10^25 threshold
💡 Expert Note: The 6× multiplier accounts for forward and backward passes plus optimizer operations. Actual compute may vary based on architecture and training approach.
Commission Designation (Article 51(1)(b))
Alternative Classification Path
Even if a model does not exceed 10^25 FLOPS, the Commission may designate it as presenting systemic risk based on:
| Criterion | Assessment Factors |
|---|---|
| Number of parameters | Model size and complexity |
| Quality of dataset | Breadth, depth, and curation of training data |
| Size of dataset | Volume of training data |
| Input/output modalities | Text, image, audio, video, code capabilities |
| Benchmarks | Performance on capability evaluations |
| Reach | Number of users, integrations, deployments |
| Number of registered users | Business scale and market penetration |
| High-impact capabilities | Assessed dangerous or transformative capabilities |
Designation Process
| Step | Actor | Action |
|---|---|---|
| 1 | Scientific Panel | Issues qualified alert identifying potential systemic risk (Article 90(1)(a)), OR Commission acts ex officio |
| 2 | Commission | Initiates investigation and evidence gathering based on Annex XIII criteria |
| 3 | Commission | Issues designation decision |
| 4 | Provider | May challenge designation (Article 52(5)) |
Challenging a Designation
Article 52(2) provides that providers may, at the time of notification, present sufficiently substantiated arguments that their model does not present systemic risk and should not be classified as such. (Note: Article 52(5) addresses a separate process for reassessment of Commission designations.) Evidence may include:
- Independent capability evaluations
- Safety testing results
- Limitation demonstrations
- Use case restrictions
- Technical safeguards implemented
Notification Requirements (Article 52)
Mandatory Notification
Article 52(1) requires GPAI providers to notify the Commission:
| Trigger | Timeline |
|---|---|
| Model meets 10^25 FLOPS threshold | Within 2 weeks of meeting threshold |
| Reasonable grounds to believe threshold will be met | Before training completion |
| Commission designation received | Immediate acknowledgment required |
Notification Content
Article 52(1) requires that the notification include the "information necessary to demonstrate that the relevant requirement has been met." The following table distinguishes between the statutory minimum and recommended best-practice content:
| Element | Content Required | Status |
|---|---|---|
| Information demonstrating the requirement is met | Evidence that the 10^25 FLOPS threshold has been reached or will be reached | Statutory minimum (Article 52(1)) |
| Provider identification | Legal entity, contact details, authorised representative | Recommended best practice |
| Model identification | Name, version, release date | Recommended best practice |
| Training compute | Cumulative FLOPS calculation and methodology | Recommended best practice |
| Capability assessment | Known capabilities and limitations | Recommended best practice |
| Intended distribution | Market placement plans | Recommended best practice |
| Risk assessment | Initial systemic risk assessment | Recommended best practice |
Notification Template
GPAI SYSTEMIC RISK NOTIFICATION
(Article 52, Regulation (EU) 2024/1689)
1. PROVIDER INFORMATION
[Legal name, address, contact, authorised representative]
2. MODEL IDENTIFICATION
[Model name, version, planned release date]
3. TRAINING COMPUTE
Total FLOPS: [X.XX × 10^YY]
Calculation methodology: [Description]
Training completion date: [Date]
4. CAPABILITY ASSESSMENT
[Summary of model capabilities]
[Known limitations]
5. RISK ASSESSMENT
[Identified systemic risks]
[Planned mitigations]
6. DISTRIBUTION PLANS
[Intended market placement approach]
[Timeline]
Submitted by: [Name, title]
Date: [Date]
Consequences of Systemic Risk Classification
Enhanced Obligations
| Baseline GPAI | + Systemic Risk Additions |
|---|---|
| Technical documentation | + Model evaluation and adversarial testing |
| Downstream information | + Systemic risk assessment at Union level |
| Copyright policy | + Incident tracking and reporting |
| Training data summary | + Cybersecurity protection |
Regulatory Scrutiny
| Aspect | Standard GPAI | Systemic Risk GPAI |
|---|---|---|
| AI Office oversight | General | Enhanced monitoring |
| Evaluation requests | Ad hoc | Regular requirements possible |
| Incident reporting | Via downstream providers | Direct to AI Office |
| Enforcement focus | Documentation | Active risk management |
Monitoring Approaching Threshold
Internal Monitoring Framework
| Metric | Monitoring Frequency | Threshold Alert |
|---|---|---|
| Training FLOPS accumulated | Daily during training | 10^24 (approaching) |
| Parameter count | Per training iteration | Planning stage |
| Dataset size growth | Weekly | Major expansions |
| Capability evaluations | Per checkpoint | Significant capability gains |
| User/deployment scale | Monthly | Rapid growth |
Pre-Threshold Preparation
For models approaching 10^25 FLOPS:
- Begin Article 55 compliance preparation
- Establish adversarial testing programme
- Develop systemic risk assessment methodology
- Implement enhanced incident tracking
- Review cybersecurity measures
- Prepare AI Office notification
- Engage Scientific Panel proactively
What You Learned
Key concepts from this chapter
Systemic risk classification applies to GPAI models with "high-impact capabilities" affecting the EU market
10^25 FLOPS creates a **presumption** of systemic risk—below-threshold models may still be designated
Commission can designate models based on multiple factors beyond compute
Providers must notify the Commission within 2 weeks of meeting the threshold
Systemic risk classification triggers Article 55 enhanced obligations