aicomply.
Lesson15 minChapter 2 of 9

GPAI Provider Obligations

Article 53 requirements for all GPAI model providers.

GPAI Provider Obligations (Article 53)

Learning Objectives

By the end of this chapter, you will be able to:

  • Identify all baseline obligations for GPAI model providers under Article 53
  • Prepare compliant technical documentation for GPAI models
  • Develop effective copyright compliance policies
  • Create sufficiently detailed training data summaries
  • Establish information-sharing frameworks with downstream providers
  • Navigate the reduced obligations for open-source GPAI models

Article 53 establishes the baseline obligations applicable to all providers of General-Purpose AI models. These requirements apply regardless of whether the GPAI model presents systemic risk, creating a foundation of transparency and accountability across the GPAI ecosystem.

Overview of Article 53 Obligations

The Four Core Obligations

ObligationArticle ReferencePurpose
Technical DocumentationArticle 53(1)(a), Annex XIEnable regulatory oversight and enforcement
Information for Downstream ProvidersArticle 53(1)(b), Annex XIIEnable downstream AI Act compliance
Copyright PolicyArticle 53(1)(c)Ensure copyright law compliance
Training Data SummaryArticle 53(1)(d)Public transparency on training data

Technical Documentation (Article 53(1)(a))

Annex XI Requirements

GPAI providers must draw up and keep up-to-date technical documentation containing at minimum:

Documentation ElementRequired Content
General DescriptionGPAI model identification, version, release date, intended uses
ArchitectureModel type, size, architecture description, modalities
TrainingTraining methodologies, data sources, preprocessing, hyperparameters
ComputeComputational resources used for training (FLOPS)
CapabilitiesKey capabilities, limitations, known weaknesses
EvaluationTesting methodologies, benchmarks, evaluation results
SafetySafety testing, red-teaming results, mitigation measures (Annex XI, Section 2 — applies only to GPAI models with systemic risk, not all GPAI models)
LifecycleVersion history, modification records (best practice — not a specific regulatory requirement under Annex XI)

Documentation Standards

Technical Documentation Structure:

1. GENERAL INFORMATION
   1.1 Model Identification
   1.2 Provider Information
   1.3 Version and Release History
   1.4 Intended Purpose and Applications

2. MODEL ARCHITECTURE
   2.1 Model Type and Family
   2.2 Parameter Count and Size
   2.3 Architecture Details
   2.4 Input/Output Modalities

3. TRAINING PROCESS
   3.1 Training Methodology
   3.2 Training Data Description
   3.3 Preprocessing and Filtering
   3.4 Training Infrastructure
   3.5 Computational Resources (FLOPS)

4. CAPABILITIES AND LIMITATIONS
   4.1 Key Capabilities
   4.2 Known Limitations
   4.3 Potential Risks
   4.4 Prohibited Uses

5. EVALUATION AND TESTING
   5.1 Benchmark Results
   5.2 Safety Evaluations
   5.3 Adversarial Testing
   5.4 Bias and Fairness Assessments

6. COMPLIANCE INFORMATION
   6.1 Copyright Compliance
   6.2 Downstream Integration Guidance
   6.3 Incident Reporting Procedures

💡 Expert Tip: The AI Office will publish templates for technical documentation. Until then, follow Annex XI requirements comprehensively. Over-documentation is preferable to gaps.

Information for Downstream Providers (Article 53(1)(b))

Annex XII Requirements

Providers must provide information and documentation to downstream providers that enables them to:

PurposeRequired Information
Understand the modelCapabilities, limitations, intended uses
Comply with AI ActInformation needed for their own compliance
Integrate safelyIntegration guidelines, API documentation
Manage risksKnown risks, recommended mitigations

Information Package Components

Model Card (Essential):

SectionContent
Model DetailsName, version, release date, provider
Intended UsePrimary intended uses, appropriate downstream applications
Out-of-Scope UseUses not suitable for the model
LimitationsKnown failure modes, accuracy limitations
RisksPotential harms, bias concerns
RecommendationsBest practices for safe deployment
Technical SpecificationsInput/output formats, API details
Training DataHigh-level description of training data
Evaluation ResultsBenchmark performance, safety evaluations
Environmental ImpactTraining compute, carbon footprint

Integration Documentation:

  • API specifications and endpoints
  • Rate limits and usage guidelines
  • Authentication and security requirements
  • Error handling procedures
  • Versioning and deprecation policies
  • Support channels and escalation paths

Downstream Provider Communication

Communication TypeFrequencyContent
Initial onboardingAt relationship startFull documentation package
Version updatesEach significant releaseChange logs, migration guidance
Safety noticesAs discoveredNewly identified risks, mitigations
Compliance updatesRegulatory changesUpdated compliance guidance
Incident notificationsAs incidents occurImpact assessment, remediation

Compliance Note

Article 53(1)(b) creates an ongoing obligation. Information must be updated as the model evolves and new risks or limitations are discovered.

Copyright Policy (Article 53(1)(c))

Copyright Compliance Requirements

GPAI providers must establish and implement a policy to comply with Union copyright law, including:

RequirementImplementation
TDM Opt-Out IdentificationDetect and respect robots.txt, TDM opt-outs
Rights Holder CommunicationProcess for rights holder inquiries
Content ExclusionExclude opted-out content from training
DocumentationRecord compliance measures

Text and Data Mining (TDM) Framework

Directive (EU) 2019/790 Context:

TDM RightApplication to GPAI
Article 3TDM for research—exception for research organisations
Article 4Commercial TDM—permitted unless rights holder opts out
Opt-Out MechanismsMachine-readable reservations must be respected

Implementing Copyright Compliance

Copyright Policy Components:

  1. Data Collection Procedures

    • Crawler configuration to detect opt-outs
    • robots.txt interpretation guidelines
    • TDM reservation detection methods
  2. Exclusion Mechanisms

    • Automatic filtering of opted-out content
    • Manual review process for unclear cases
    • Content removal procedures
  3. Rights Holder Communication

    • Inquiry response process
    • Content takedown procedures
    • Dispute resolution mechanism
  4. Documentation and Records

    • Training data provenance tracking
    • Opt-out compliance records
    • Audit trail for compliance verification

💡 Practical Guidance: Implement both technical measures (robots.txt parsing, opt-out detection) and organisational measures (rights holder inquiry process, content removal procedures).

Training Data Summary (Article 53(1)(d))

Public Disclosure Requirement

GPAI providers must make publicly available a sufficiently detailed summary of the content used for training the GPAI model.

"Sufficiently Detailed" Standard

The summary must enable understanding of:

AspectRequired Detail
Data SourcesCategories of sources (web, books, code repositories)
Data TypesText, images, audio, code, structured data
Geographic/Linguistic ScopeLanguages covered, regional focus
Time PeriodDate range of training data
Data VolumeApproximate size (tokens, images, hours)
Curation MethodsFiltering, cleaning, deduplication approaches
Sensitive CategoriesHandling of personal data, harmful content

Training Data Summary Template

TRAINING DATA SUMMARY
[Model Name] - [Version] - [Date]

1. DATA SOURCES
   - Web crawl data: ~X TB from common crawl and proprietary crawls
   - Books and publications: ~X million documents
   - Code repositories: ~X billion lines from open source projects
   - [Other categories]

2. DATA COMPOSITION
   - Languages: [List primary languages and percentages]
   - Content types: [Text X%, Code X%, Other X%]
   - Time range: [Start date] to [End date]

3. DATA CURATION
   - Filtering: [Description of quality filters applied]
   - Deduplication: [Approach to removing duplicates]
   - Harmful content removal: [Methods for removing problematic content]

4. COPYRIGHT COMPLIANCE
   - TDM opt-outs respected
   - [Description of copyright compliance measures]

5. PERSONAL DATA
   - [Approach to personal data in training set]
   - [Privacy-preserving measures applied]

[Provider Name]
[Publication Date]

Compliance Note

The AI Office will publish a template for the training data summary. Providers should prepare detailed summaries now and adjust to the official template when published.

Open Source GPAI Provisions (Article 53(2))

Reduced Obligations for Open Source

Article 53(2) provides that providers of GPAI models released under free and open source licences, where model parameters are made publicly available, only need to comply with:

ObligationApplies to Open Source?
Technical documentation (Annex XI)No (reduced)
Downstream provider information (Annex XII)No (reduced)
Copyright policyYes
Training data summaryYes

Conditions for Open Source Exemption

CriterionRequirement
LicenceFree and open source licence
Model ParametersMade publicly available
No Systemic RiskModel does not present systemic risk

💡 Note: Article 53(2) does not require "commercial independence" as a condition. An open-source GPAI model can be offered alongside commercial services and still qualify for the reduced obligations, provided the licence is genuinely free/open-source and parameters are publicly available.

Important Limitations

Compliance Note

The open source exemption does **NOT** apply if:

  • The GPAI model presents systemic risk (Article 55 obligations apply in full)
  • Model parameters are not truly publicly available
  • The licence does not meet free and open source standards

Compliance Checklist: Article 53

Technical Documentation:

  • General model description complete
  • Architecture documented
  • Training process detailed
  • Computational resources recorded
  • Evaluation results included
  • Known limitations documented
  • Documentation update process established

Downstream Provider Information:

  • Model card prepared
  • Integration documentation available
  • Capabilities and limitations clearly stated
  • Prohibited uses defined
  • Support channels established
  • Update notification process in place

Copyright Compliance:

  • TDM opt-out detection implemented
  • Rights holder inquiry process established
  • Content exclusion procedures operational
  • Copyright policy documented
  • Compliance records maintained

Training Data Summary:

  • Data sources documented
  • Data composition detailed
  • Curation methods described
  • Copyright compliance stated
  • Summary made publicly available

What You Learned

Key concepts from this chapter

All GPAI providers must comply with Article 53 baseline obligations regardless of systemic risk status

Technical documentation (Annex XI) must be comprehensive and kept up-to-date

Information for downstream providers (Annex XII) enables their AI Act compliance

Copyright policy must address TDM opt-outs and rights holder communications

Training data summary must be publicly available and "sufficiently detailed"

Chapter Complete

GPAI Compliance

2/9

chapters