aicomply.
Lesson12 minChapter 2 of 14

Data Governance

Article 10 requirements for training, validation, and testing data.

Data Governance (Article 10)

Learning Objectives

By the end of this chapter, you will be able to:

  • Implement comprehensive data governance for AI training, validation, and testing
  • Apply Article 10 data quality requirements in practice
  • Conduct systematic bias examination across protected characteristics
  • Navigate the GDPR intersection for processing sensitive personal data
  • Document data governance processes for compliance demonstration

Article 10 establishes mandatory data governance practices for high-risk AI systems. Since AI systems are fundamentally shaped by their training data, poor data governance leads to unreliable, biased, or unsafe AI. This article ensures data quality from collection through model deployment.

Scope: What Data is Covered

Article 10 applies to three categories of datasets:

Dataset TypePurposeGovernance Requirement
Training DataModel learning and developmentFull Article 10 requirements
Validation DataModel tuning and hyperparameter selectionFull Article 10 requirements
Testing DataPerformance evaluation and verificationFull Article 10 requirements

Compliance Note

Per Article 10(6), for the development of high-risk AI systems not using techniques involving the training of AI models, paragraphs 2 to 5 apply only to the testing data sets.


The Data Governance Framework

Article 10(2): Mandatory Governance Practices

You must implement governance practices covering:

RequirementWhat It MeansPractical Implementation
(a) Design choicesDocument why specific data was selectedData selection criteria documentation
(b) Collection processesRecord how data was gathered and its originData provenance tracking
(c) Preparation operationsDocument annotation, labelling, cleaningData pipeline documentation
(d) AssumptionsState what the data is meant to measureData dictionary and metadata
(e) Availability/suitabilityAssess if data is sufficient for purposeData adequacy assessment
(f) Bias examinationCheck for discriminatory patternsBias audit processes
(g) Bias mitigationAppropriate measures to detect, prevent and mitigate possible biases identified according to point (f)Bias mitigation implementation
(h) Gaps/shortcomingsIdentify what's missing or problematicData gap analysis

Data Quality Requirements

Article 10(3): Quality Criteria

Datasets must meet these quality standards:

Relevant

  • Data directly relates to the intended purpose
  • Features are predictive of target outcomes
  • Domain-appropriate data sources

Sufficiently Representative

  • Covers the deployment population
  • Includes edge cases and boundary conditions
  • Geographic and demographic coverage

Free of Errors (to the best extent possible)

  • Accurate labelling and annotation
  • Correct data values
  • Minimal measurement errors

Complete (in view of intended purpose)

  • No critical missing data
  • Sufficient sample sizes
  • Temporal coverage as needed

Data Quality Checklist

Quality DimensionAssessment Questions
AccuracyAre labels correct? Are measurements precise?
CompletenessIs required data present? Are there gaps?
ConsistencyIs data formatted uniformly? Are definitions stable?
TimelinessIs data current? Does it reflect deployment conditions?
RepresentativenessDoes data reflect the deployment population?
RelevanceDoes data relate to the intended purpose?

Bias Examination Requirements

Article 10(2)(f): Mandatory Bias Assessment

You must examine datasets for biases "likely to affect health and safety of persons, have a negative impact on fundamental rights, or lead to discrimination."

Types of Bias to Examine

Bias TypeDescriptionExample
Selection BiasNon-representative samplingRecruiting AI trained only on tech workers
Measurement BiasInconsistent data collectionDifferent interview standards for groups
Label BiasDiscriminatory labelling patternsHistorical bias in performance ratings
Representation BiasUnder/over-representation of groupsMedical AI trained mostly on one gender
Aggregation BiasGrouping hides disparitiesOne model for diverse populations
Historical BiasData reflects past discriminationCredit data reflecting redlining

Protected Characteristics to Assess

Under EU non-discrimination law, examine bias across:

  • Sex/Gender
  • Racial or ethnic origin
  • Religion or belief
  • Disability
  • Age
  • Sexual orientation
  • Nationality

Bias Examination Process

Step 1: Demographic Analysis

  • Analyse representation of protected groups
  • Identify under/over-represented populations
  • Document representation gaps

Step 2: Label Distribution Analysis

  • Examine outcome labels across groups
  • Identify historical discrimination patterns
  • Assess label consistency across groups

Step 3: Feature Analysis

  • Identify features correlated with protected characteristics
  • Assess proxy discrimination risks
  • Document feature selection rationale

Step 4: Subgroup Performance

  • Test model performance across groups
  • Identify disparate accuracy or error rates
  • Document performance gaps

Processing Sensitive Personal Data

Article 10(5): Special Category Data Exception

Processing special category data (Article 9 GDPR) for bias monitoring is permitted only when all six conditions set out in Article 10(5)(a)-(f) are met:

Conditions (all must be met):

  1. (a) The bias detection and correction cannot be effectively fulfilled by processing other data, including synthetic or anonymised data
  2. (b) The special categories of personal data are subject to technical limitations on re-use and state-of-the-art security and privacy-preserving measures, including pseudonymisation
  3. (c) The special categories of personal data are subject to measures to ensure that the personal data processed are secured, protected, subject to suitable safeguards, including strict controls and documentation of the access
  4. (d) The special categories of personal data are not to be transmitted, transferred or otherwise accessed by other parties
  5. (e) The special categories of personal data are deleted once the bias has been corrected or the personal data has reached the end of its retention period, whichever comes first
  6. (f) The records of processing activities pursuant to GDPR Regulations include the reasons why the processing of special categories of personal data was strictly necessary

Required Safeguards:

  • Technical measures (pseudonymisation, access controls)
  • Organisational measures (policies, training)
  • Prohibition of processing for any other purpose
  • Deletion after bias monitoring complete
GDPR Article 9 CategoryAI Act Treatment
Racial/ethnic originMay process for bias monitoring with safeguards
Political opinionsMay process for bias monitoring with safeguards
Religious beliefsMay process for bias monitoring with safeguards
Health dataMay process for bias monitoring with safeguards
Sex life/orientationMay process for bias monitoring with safeguards
Biometric dataMay process for bias monitoring with safeguards

Expert Insight

The AI Act creates a specific legal basis for processing sensitive data to prevent AI discrimination. This is a significant departure from GDPR's otherwise restrictive approach to special category data. Document your justification carefully.


Data Provenance and Lineage

Tracking Data Origins

For each dataset, document:

ElementRequired Information
SourceWhere data originated
Collection methodHow data was gathered
Collection dateWhen data was collected
Legal basisLawful basis for collection
TransformationsHow data was processed
Chain of custodyWho handled the data

Third-Party Data Considerations

When using external data:

  • Verify provider's data governance practices
  • Obtain representations about data quality
  • Conduct independent quality assessment
  • Document due diligence process

Documentation Requirements

Data Governance Documentation

Your technical documentation (Annex IV) must include:

DocumentContent
Data DocumentationDescription of datasets, collection, preparation
Bias Assessment ReportMethods and findings of bias examination
Data Quality AssessmentEvidence of quality criteria compliance
Sensitive Data JustificationIf applicable, justification for Article 10(5)
Gap AnalysisIdentified shortcomings and mitigation

Integration with Other Requirements

RequirementData Governance Connection
Risk Management (Art. 9)Data risks feed into risk assessment
Technical Documentation (Art. 11)Data governance is mandatory documentation content
Accuracy (Art. 15)Data quality directly affects accuracy
Post-Market Monitoring (Art. 72)Monitor for data drift and degradation

Data Governance Compliance Checklist

  • Training, validation, and testing data identified
  • Data design choices documented
  • Collection processes and origins recorded
  • Preparation operations documented
  • Data assumptions stated
  • Availability and suitability assessed
  • Bias examination completed across protected characteristics
  • Data gaps and shortcomings identified
  • Quality criteria (relevant, representative, error-free, complete) assessed
  • Sensitive data processing justified (if applicable)
  • Data governance documentation complete

What You Learned

Key concepts from this chapter

Article 10 applies to **training, validation, AND testing data**

Data must be **relevant, representative, error-free, and complete** for the intended purpose

**Bias examination is mandatory** across protected characteristics

**Sensitive personal data** may be processed for bias monitoring under strict conditions

**Document everything**—data governance is core to technical documentation

Chapter Complete

High-Risk AI Compliance

2/14

chapters