Learning Objectives By the end of this chapter, you will be able to:
Apply GPAI classification criteria to real-world model scenarios Walk through each compliance obligation step-by-step with practical implementation Identify documentation requirements and prepare compliant materials Navigate downstream provider relationships and information requirements Develop a transition plan for potential systemic risk classification Scenario Overview
This comprehensive case study follows a hypothetical but realistic GPAI provider through the entire compliance journey. The scenario is designed to illustrate common challenges and practical solutions.
Company Profile: EuroFoundation AI
Attribute Details Company Name EuroFoundation AI GmbH Headquarters Munich, Germany Founded 2021 Employees 450 (including 180 ML researchers) Business Model B2B foundation model licensing Annual Revenue €180 million
Model Profile: EFAI-Alpha
Specification Value Model Name EFAI-Alpha Architecture Transformer-based autoregressive LLM Parameters 70 billion Training Compute 8.2 × 10^24 FLOPS Training Data 4.1 TB filtered web text, books, code Training Duration 89 days on 1,024 NVIDIA A100 GPUs Languages 15 European languages + English Context Window 32,768 tokens Release Date March 2025
Customer Base
Customer Segment Use Cases Number of Customers Enterprise Software Customer support chatbots, document analysis 45 Legal Services Contract review, legal research 12 Healthcare Clinical documentation, medical Q&A 8 Financial Services Report generation, compliance checking 15 Education Tutoring systems, content generation 20
Step 1: GPAI Classification Analysis
Is EFAI-Alpha a GPAI Model?
Apply the Article 3(63) definition systematically:
Criterion Analysis Conclusion "AI model" Neural network trained to predict next token Yes "Displays significant generality" Performs across languages, domains, task types Yes "Capable of competently performing wide range of distinct tasks" Succeeds at translation, QA, summarisation, coding, analysis Yes "Can be integrated into variety of downstream systems or applications" Licensed for diverse applications via API Yes
Classification Result: EFAI-Alpha is a GPAI model under Article 3(63).
Systemic Risk Assessment
Factor Analysis Result Training Compute 8.2 × 10^24 FLOPS Below 10^25 threshold Commission Designation Not designated N/A High-Impact Capabilities No exceptional capabilities beyond comparable models Standard capabilities
Systemic Risk Result: EFAI-Alpha is not a systemic risk model. Standard GPAI obligations under Article 53 apply.
Expert Note: At 8.2 × 10^24 FLOPS, EFAI-Alpha is approaching the systemic risk threshold. EuroFoundation AI should prepare for potential classification if they plan to scale further. We'll address transition planning at the end of this case study.
Step 2: Technical Documentation (Article 53(1)(a))
EuroFoundation AI must prepare documentation per Annex XI requirements.
Model Architecture Documentation
Documentation Element Content Location Architecture Type Decoder-only transformer with 70B parameters Technical Spec v3.2, Section 2.1 Layer Configuration 96 layers, 96 attention heads, 12,288 hidden dimension Technical Spec v3.2, Section 2.2 Attention Mechanism Grouped Query Attention (GQA) with 8 KV heads Technical Spec v3.2, Section 2.3 Position Encoding Rotary Position Embeddings (RoPE) Technical Spec v3.2, Section 2.4 Vocabulary 128,000 tokens, BPE tokeniser Technical Spec v3.2, Section 2.5 Context Length 32,768 tokens maximum Technical Spec v3.2, Section 2.6
Training Process Documentation
Element Documentation Training Infrastructure 1,024 NVIDIA A100 80GB GPUs across 128 nodes Training Framework Custom distributed training on PyTorch 2.0 Optimisation AdamW, β1=0.9, β2=0.95, weight decay=0.1 Learning Rate Cosine schedule, peak 3×10^-4, warmup 2000 steps Batch Size 4 million tokens per batch Training Duration 89 days, 1.2 million steps Total Compute 8.2 × 10^24 FLOPS
Training Data Documentation
Dataset Component Size Source Processing Web Corpus 2.8 TB CommonCrawl (2020-2024) Language ID, quality filtering, deduplication Books 0.6 TB Public domain, licensed OCR correction, formatting Code 0.4 TB GitHub (permissive licences) Syntax validation, quality scoring Scientific Papers 0.2 TB arXiv, PubMed (open access) PDF extraction, citation processing EU Legal Corpus 0.1 TB EUR-Lex, national law databases Structured extraction, version tracking
Evaluation Results
Benchmark EFAI-Alpha Score Comparable Models MMLU (5-shot) 74.2% GPT-3.5: 70.0%, Llama 2 70B: 68.9% HellaSwag 85.8% GPT-3.5: 85.5%, Llama 2 70B: 85.3% TruthfulQA 51.3% GPT-3.5: 47.0%, Llama 2 70B: 44.9% HumanEval (code) 42.1% GPT-3.5: 48.1%, Llama 2 70B: 29.9% Multilingual MMLU 71.8% (avg across 15 EU languages) Limited comparison data
Known Limitations Documentation
Limitation Category Specific Limitations Mitigation Recommendations Factual Accuracy May generate plausible-sounding but incorrect facts Implement fact-checking for critical applications Temporal Knowledge Training cutoff January 2024 Supplement with current data retrieval Mathematical Reasoning Struggles with multi-step arithmetic Use calculator tools for precise calculations Bias Underperforms on non-European cultural contexts Evaluate performance on target demographics Hallucination May cite non-existent sources Verify all citations independently Safety Standard jailbreak vulnerabilities Apply additional safety layers in deployment
Step 3: Copyright Compliance (Article 53(1)(c))
TDM Opt-Out Implementation
EuroFoundation AI implemented a comprehensive opt-out compliance programme:
Component Implementation Evidence robots.txt Parsing Custom parser checking for ai/ml training opt-outs Parser logs showing 12.3M domains checked TDM Reservation Detection Pattern matching for "text and data mining reservation" Detection records in data pipeline HTTP Header Checking X-Robots-Tag: noai parsing Server response logs Rightsholder Registry Integration with emerging TDM reservation databases API integration logs Excluded Content Record Log of all excluded domains and content Exclusion database (3.2M entries)
Training Data Summary (Public)
Published at: https://eurofoundation.ai/efai-alpha/training-data-summary
Summary Section Content Data Categories Web text (68%), Books (15%), Code (10%), Scientific (5%), Legal (2%) Languages English (45%), German (12%), French (10%), Spanish (8%), Other EU (25%) Collection Period 2020-2024 Exclusions Applied TDM opt-out (3.2M domains), Adult content, Personal data patterns Methodology Automated crawling with quality filtering and deduplication
Copyright Compliance Documentation
Document Purpose Location Data Sourcing Policy Internal policy on permissible data sources Legal/Compliance Drive Opt-Out Compliance Procedure Technical and legal procedure for TDM compliance Data Team Wiki Exclusion Audit Trail Verifiable record of excluded content Data Pipeline Database Rightsholder Query Response Template for responding to copyright queries Legal Templates
Step 4: Downstream Provider Information (Article 53(1)(b))
Model Card (Customer-Facing)
EuroFoundation AI provides each customer with a comprehensive model card:
Section Content Summary Model Overview Architecture, capabilities, intended use Performance Benchmarks Standardised benchmark results with confidence intervals Capabilities by Task Detailed capability assessment for common use cases Limitations Known weaknesses, failure modes, edge cases Bias Assessment Fairness evaluation results, demographic performance variations Safety Evaluation Red team testing results, known vulnerabilities Recommended Mitigations Deployment best practices for each use case
Use Case-Specific Guidance
For each major customer segment, additional guidance:
Customer Segment Specific Guidance Risk Considerations Healthcare Clinical validation requirements, not suitable for diagnosis High-risk classification likely; conduct conformity assessment Legal Verify all citations, not a substitute for legal advice Accuracy critical; human oversight mandatoryFinancial Regulatory disclaimers, audit trail requirements May fall under financial regulation Education Age-appropriate content filtering, academic integrity Safeguards for vulnerable users Enterprise Data retention, confidentiality, access controls Enterprise security requirements
Prohibited Uses Policy
Prohibited Use Rationale Enforcement Weapons Development Safety and legal Terms of service, API monitoring Mass Surveillance Fundamental rights Terms of service, customer screening Deceptive Content Generation Misinformation risk Terms of service, content policies Child Safety Violations Legal requirement Terms of service, technical controls Critical Infrastructure Control Safety Customer screening, use case review
Information Update Process
Update Type Timeline Communication Channel Critical Safety Issue Immediate Direct notification + Security advisory Capability Changes Pre-release Changelog + Email notification Documentation Updates With release Documentation portal notification Policy Changes 30 days notice Email + Portal announcement
Step 5: Compliance Infrastructure
Organisational Structure
Role Responsibility Reporting Line Head of AI Compliance Overall GPAI compliance Reports to General Counsel Technical Documentation LeadAnnex XI compliance Reports to Head of AI Compliance Data Governance ManagerTraining data , copyrightReports to Head of AI Compliance Customer Compliance Lead Downstream obligations Reports to Head of AI Compliance EU Representative Regulatory liaison Reports to General Counsel
Compliance Process
Process Frequency Owner Documentation Review Quarterly Technical Documentation LeadCopyright Audit Bi-annual Data Governance ManagerCustomer Guidance Review Annual Customer Compliance Lead Regulatory Monitoring Continuous Head of AI Compliance Incident Response As needed Cross-functional team
Documentation Repository
Repository Contents Access Technical Documentation Architecture, training, evaluation Internal + Regulator access Copyright Compliance Data provenance, opt-out records Internal + Regulator access Customer Materials Model cards, guidance, terms Customer portal Compliance Records Audit trails, certifications Internal + Regulator access
Step 6: Systemic Risk Transition Planning
Although EFAI-Alpha is currently below the systemic risk threshold, EuroFoundation AI is planning for potential classification:
Compute Monitoring
Metric Current Alert Threshold Systemic Threshold Training FLOPS 8.2 × 10^24 5 × 10^24 10^25 Next Model (EFAI-Beta) Planned: 2 × 10^25 N/A Will exceed
Pre-Compliance Preparation for EFAI-Beta
Obligation Preparation Status Gap Adversarial Testing Red team capability established Need external expertise Systemic Risk AssessmentMethodology drafted Need Union-level scope expansion Incident Reporting Framework in place Need AI Office templates Cybersecurity SOC 2 certified Need model-specific assessment
Timeline for EFAI-Beta
Milestone Target Date Dependencies Training Start Q4 2025 Infrastructure readiness Article 55 Preparation Q3 2025 (before training) Compliance programme AI Office NotificationWithin 2 weeks of threshold Training completion Compliance Operational Training completion All preparations complete
Cost and Resource Summary
Compliance Investment
Category Annual Cost FTE Compliance Staff €1.2M 8 Technical Documentation €400K 3 Legal/External Counsel €300K - Copyright Compliance Tools €150K - Training and Certification €100K - Total €2.15M 11
Return on Investment
Benefit Value Market Access €180M revenue depends on EU compliance Customer Confidence Compliance enables enterprise sales Risk Mitigation Avoids potential penalties of up to €15M or 3% of global turnover per Article 101(1) for GPAI providers Competitive Advantage Early compliance differentiates
Lessons Learned
What Worked Well
Success Factor Description Early Investment Started compliance programme before obligations effective Cross-Functional Team Legal, technical, and business collaboration Documentation First Built documentation into development process Customer Communication Proactive engagement on compliance requirements
Challenges Encountered
Challenge Resolution Retroactive Data Provenance Implemented tracking mid-development; some gaps remain Benchmark Selection Unclear which benchmarks satisfy regulatory expectations Customer High-Risk Concerns Developed guidance for customers' conformity assessments Regulatory Uncertainty Engaged with AI Office for clarification
Recommendations for Other Providers
Recommendation Rationale Start Early Compliance takes months to implement properly Build Documentation In Retroactive documentation is harder and less reliable Engage Customers Compliance is a value chain responsibility Monitor Regulatory Guidance Expectations evolve—stay current Plan for Scaling Prepare for systemic risk if growth continues
Compliance Checklist Summary
Technical Documentation (Article 53(1)(a))
Architecture documentation complete
Training process documented
Training data detailed
Evaluation results published
Known limitations documented
Downstream Information (Article 53(1)(b))
Model card prepared
Use case guidance provided
Integration requirements documented
Update process established
Copyright Compliance (Article 53(1)(c))
TDM opt-out detection implemented
Excluded content recorded
Training data summary published
Rightsholder query process established
Training Data Summary (Article 53(1)(d))
Summary prepared
Summary published publicly
Methodology documented
Systemic Risk Preparation (Future)
Compute monitoring in place
Article 55 preparation planned
External red team engaged
AI Office relationship established