Data Governance (Article 10)

Learning Objectives

By the end of this chapter, you will be able to:

Implement comprehensive data governance for AI training, validation, and testing
Apply Article 10 data quality requirements in practice
Conduct systematic bias examination across protected characteristics
Navigate the GDPR intersection for processing sensitive personal data
Document data governance processes for compliance demonstration

Article 10 establishes mandatory data governance practices for high-risk AI systems. Since AI systems are fundamentally shaped by their training data, poor data governance leads to unreliable, biased, or unsafe AI. This article ensures data quality from collection through model deployment.

Scope: What Data is Covered

Article 10 applies to three categories of datasets:

Dataset Type	Purpose	Governance Requirement
Training Data	Model learning and development	Full Article 10 requirements
Validation Data	Model tuning and hyperparameter selection	Full Article 10 requirements
Testing Data	Performance evaluation and verification	Full Article 10 requirements

Compliance Note

Per Article 10(6), for the development of high-risk AI systems not using techniques involving the training of AI models, paragraphs 2 to 5 apply only to the testing data sets.

The Data Governance Framework

Article 10(2): Mandatory Governance Practices

You must implement governance practices covering:

Requirement	What It Means	Practical Implementation
(a) Design choices	Document why specific data was selected	Data selection criteria documentation
(b) Collection processes	Record how data was gathered and its origin	Data provenance tracking
(c) Preparation operations	Document annotation, labelling, cleaning	Data pipeline documentation
(d) Assumptions	State what the data is meant to measure	Data dictionary and metadata
(e) Availability/suitability	Assess if data is sufficient for purpose	Data adequacy assessment
(f) Bias examination	Check for discriminatory patterns	Bias audit processes
(g) Bias mitigation	Appropriate measures to detect, prevent and mitigate possible biases identified according to point (f)	Bias mitigation implementation
(h) Gaps/shortcomings	Identify what's missing or problematic	Data gap analysis

Data Quality Requirements

Article 10(3): Quality Criteria

Datasets must meet these quality standards:

Relevant

Data directly relates to the intended purpose
Features are predictive of target outcomes
Domain-appropriate data sources

Sufficiently Representative

Covers the deployment population
Includes edge cases and boundary conditions
Geographic and demographic coverage

Free of Errors (to the best extent possible)

Accurate labelling and annotation
Correct data values
Minimal measurement errors

Complete (in view of intended purpose)

No critical missing data
Sufficient sample sizes
Temporal coverage as needed

Data Quality Checklist

Quality Dimension	Assessment Questions
Accuracy	Are labels correct? Are measurements precise?
Completeness	Is required data present? Are there gaps?
Consistency	Is data formatted uniformly? Are definitions stable?
Timeliness	Is data current? Does it reflect deployment conditions?
Representativeness	Does data reflect the deployment population?
Relevance	Does data relate to the intended purpose?

Bias Examination Requirements

Article 10(2)(f): Mandatory Bias Assessment

You must examine datasets for biases "likely to affect health and safety of persons, have a negative impact on fundamental rights, or lead to discrimination."

Types of Bias to Examine

Bias Type	Description	Example
Selection Bias	Non-representative sampling	Recruiting AI trained only on tech workers
Measurement Bias	Inconsistent data collection	Different interview standards for groups
Label Bias	Discriminatory labelling patterns	Historical bias in performance ratings
Representation Bias	Under/over-representation of groups	Medical AI trained mostly on one gender
Aggregation Bias	Grouping hides disparities	One model for diverse populations
Historical Bias	Data reflects past discrimination	Credit data reflecting redlining

Protected Characteristics to Assess

Under EU non-discrimination law, examine bias across:

Sex/Gender
Racial or ethnic origin
Religion or belief
Disability
Age
Sexual orientation
Nationality

Bias Examination Process

Step 1: Demographic Analysis

Analyse representation of protected groups
Identify under/over-represented populations
Document representation gaps

Step 2: Label Distribution Analysis

Examine outcome labels across groups
Identify historical discrimination patterns
Assess label consistency across groups

Step 3: Feature Analysis

Identify features correlated with protected characteristics
Assess proxy discrimination risks
Document feature selection rationale

Step 4: Subgroup Performance

Test model performance across groups
Identify disparate accuracy or error rates
Document performance gaps

Processing Sensitive Personal Data

Article 10(5): Special Category Data Exception

Processing special category data (Article 9 GDPR) for bias monitoring is permitted only when all six conditions set out in Article 10(5)(a)-(f) are met:

Conditions (all must be met):

(a) The bias detection and correction cannot be effectively fulfilled by processing other data, including synthetic or anonymised data
(b) The special categories of personal data are subject to technical limitations on re-use and state-of-the-art security and privacy-preserving measures, including pseudonymisation
(c) The special categories of personal data are subject to measures to ensure that the personal data processed are secured, protected, subject to suitable safeguards, including strict controls and documentation of the access
(d) The special categories of personal data are not to be transmitted, transferred or otherwise accessed by other parties
(e) The special categories of personal data are deleted once the bias has been corrected or the personal data has reached the end of its retention period, whichever comes first
(f) The records of processing activities pursuant to GDPR Regulations include the reasons why the processing of special categories of personal data was strictly necessary

Required Safeguards:

Technical measures (pseudonymisation, access controls)
Organisational measures (policies, training)
Prohibition of processing for any other purpose
Deletion after bias monitoring complete

GDPR Article 9 Category	AI Act Treatment
Racial/ethnic origin	May process for bias monitoring with safeguards
Political opinions	May process for bias monitoring with safeguards
Religious beliefs	May process for bias monitoring with safeguards
Health data	May process for bias monitoring with safeguards
Sex life/orientation	May process for bias monitoring with safeguards
Biometric data	May process for bias monitoring with safeguards

Expert Insight

The AI Act creates a specific legal basis for processing sensitive data to prevent AI discrimination. This is a significant departure from GDPR's otherwise restrictive approach to special category data. Document your justification carefully.

Data Provenance and Lineage

Tracking Data Origins

For each dataset, document:

Element	Required Information
Source	Where data originated
Collection method	How data was gathered
Collection date	When data was collected
Legal basis	Lawful basis for collection
Transformations	How data was processed
Chain of custody	Who handled the data

Third-Party Data Considerations

When using external data:

Verify provider's data governance practices
Obtain representations about data quality
Conduct independent quality assessment
Document due diligence process

Documentation Requirements

Data Governance Documentation

Your technical documentation (Annex IV) must include:

Document	Content
Data Documentation	Description of datasets, collection, preparation
Bias Assessment Report	Methods and findings of bias examination
Data Quality Assessment	Evidence of quality criteria compliance
Sensitive Data Justification	If applicable, justification for Article 10(5)
Gap Analysis	Identified shortcomings and mitigation

Integration with Other Requirements

Requirement	Data Governance Connection
Risk Management (Art. 9)	Data risks feed into risk assessment
Technical Documentation (Art. 11)	Data governance is mandatory documentation content
Accuracy (Art. 15)	Data quality directly affects accuracy
Post-Market Monitoring (Art. 72)	Monitor for data drift and degradation

Data Governance Compliance Checklist

Training, validation, and testing data identified
Data design choices documented
Collection processes and origins recorded
Preparation operations documented
Data assumptions stated
Availability and suitability assessed
Bias examination completed across protected characteristics
Data gaps and shortcomings identified
Quality criteria (relevant, representative, error-free, complete) assessed
Sensitive data processing justified (if applicable)
Data governance documentation complete

Data Governance