Real-World Testing
Article 60 provisions for testing AI in real conditions.
Real-World Testing (Article 60)
Learning Objectives
By the end of this chapter, you will be able to:
- Understand when and how real-world testing is permitted under Article 60
- Design testing plans that meet regulatory requirements
- Implement informed consent procedures compliant with the AI Act
- Establish monitoring and safeguard frameworks for live testing
- Document real-world testing to support conformity assessment
Introduction: Testing in the Real World
Laboratory testing and simulations have limits. At some point, AI systems must be tested in real-world conditions to validate performance. Article 60 provides a framework for such testing—balancing the need for realistic validation against the protection of affected persons.
Expert Insight
Real-world testing is where theory meets practice. The AI Act doesn't prohibit it—it requires that you do it responsibly. A well-designed testing program actually strengthens your conformity assessment by providing real-world performance evidence.
Legal Framework (Article 60)
When Real-World Testing Applies
| Condition | Article 60 Requirement |
|---|---|
| System type | High-risk AI systems under Annex III |
| Lifecycle stage | Before placing on market or putting into service |
| Purpose | Testing performance under real-world conditions |
| Approval pathway | Either within a sandbox OR with approved testing plan |
| Safeguards | Subject to specific protections for affected persons |
Article 60 vs. Sandbox Testing
| Aspect | Regulatory Sandbox (Articles 57-58) | Real-World Testing (Article 60) |
|---|---|---|
| Scope | Broader development and validation | Focused performance testing |
| Duration | Longer (6-24 months typical) | Shorter, specific testing period |
| Regulatory engagement | Continuous supervision | Plan approval + monitoring |
| Best for | Novel systems, compliance uncertainty | Validating known system performance |
| Relationship | Can include real-world testing | Can be standalone or within sandbox |
Prerequisites for Real-World Testing
Mandatory Requirements
| Requirement | Article Reference | What It Means |
|---|---|---|
| Testing plan | Article 60(4) | Detailed plan approved by market surveillance authority |
| Informed consent | Article 60(4)(i) | Freely-given informed consent from test subjects |
| Monitoring | Article 60(4)(a) | Effective oversight with ability to intervene |
| Reversibility | Article 60(4)(d) | AI decisions can be disregarded or reversed |
| Risk mitigation | Article 60(4)(b) | Safeguards to prevent harm |
| No significant risk | Article 60(4)(b) | Testing must not create serious health/safety risks |
| Liability | Article 60(9) | Provider remains liable under applicable Union and national liability law |
Testing Plan Contents
| Plan Element | Description | Authority Review Focus |
|---|---|---|
| System description | Technical details of the AI being tested | Understanding what's being tested |
| Testing objectives | What will be validated, success criteria | Clarity and measurability |
| Testing methodology | How testing will be conducted | Scientific validity |
| Subject selection | Who will participate, how recruited | Representativeness, vulnerability |
| Informed consent process | How consent will be obtained and documented | Adequacy and voluntariness |
| Safeguards | Protections for test subjects | Sufficiency of protections |
| Monitoring procedures | How testing will be supervised | Ability to detect and respond to issues |
| Intervention triggers | When testing will be stopped | Clear thresholds for action |
| Data handling | How data will be collected, used, protected | GDPR compliance |
| Duration and scope | How long, how many subjects, what contexts | Proportionality |
| Incident procedures | How incidents will be handled and reported | Response capability |
Informed Consent Framework
Consent Requirements (Article 60(4)(i), see also Article 61)
| Requirement | Implementation |
|---|---|
| Freely-given | No coercion, pressure, or undue inducement |
| Informed | Subject understands what they're consenting to, with information per Article 61(1) |
| Documented | Written or electronic record of consent |
| Withdrawable | Subject can withdraw at any time without consequence |
Information to Provide
Before consent, test subjects must be informed about:
| Information Element | Example Content |
|---|---|
| Nature of the AI system | "This AI analyses your responses to assess creditworthiness" |
| Purpose of testing | "We are testing the system's accuracy before market launch" |
| How they will be affected | "The AI will evaluate your application alongside our standard process" |
| Risks and safeguards | "There is a risk of incorrect assessment; human review verifies all decisions" |
| Data collection and use | "Your data will be used for testing only and deleted after 12 months" |
| Duration of participation | "Testing lasts 3 months; your involvement is for one application" |
| Right to withdraw | "You can withdraw at any time; your application continues normally" |
| Contact for questions | "Contact our Data Protection Officer at dpo@company.com" |
| Testing identification number | The Union-wide unique single identification number of the testing per Article 61(1)(e) |
Consent Exceptions
| Situation | Exception Basis | Requirements |
|---|---|---|
| Law enforcement/migration | Article 60(4)(i) exception | Testing must not negatively affect subjects; personal data deleted after test |
| Emergency situations | Not explicitly addressed | Likely requires post-hoc consent or exemption |
Compliance Note
The consent exception for law enforcement and migration is narrow and requires enhanced safeguards. Don't assume it applies—seek legal advice before using this exception.
Monitoring and Safeguards
Monitoring Framework
| Monitoring Element | Implementation | Purpose |
|---|---|---|
| Real-time oversight | Dashboard, alerts, human supervisor | Detect issues immediately |
| Performance tracking | Accuracy, fairness, drift metrics | Validate system performance |
| Incident detection | Automated and manual detection | Identify problems early |
| Subject feedback | Channels for concerns/complaints | Capture subjective impacts |
| Documentation | Comprehensive logging | Evidence for conformity assessment |
Intervention Triggers
Define clear thresholds for action:
| Trigger | Response |
|---|---|
| Performance below threshold | Pause testing, investigate, remediate |
| Bias detected | Suspend testing for affected groups, analyse |
| Harm to subject | Stop testing immediately, support subject, report |
| Subject withdrawal | Remove from testing, honour consent revocation |
| Authority request | Immediate compliance with authority instructions |
Human Oversight During Testing
| Oversight Type | When Required | Implementation |
|---|---|---|
| Pre-decision review | High-stakes decisions (e.g., credit, employment) | Human reviews before action |
| Concurrent monitoring | All testing | Supervisor monitors in real-time |
| Post-decision review | All AI decisions | Human reviews outcomes |
| Override capability | Always | Ability to disregard AI output |
Special Categories of Testing
Vulnerable Populations
Testing involving vulnerable persons requires enhanced safeguards:
| Vulnerable Group | Additional Requirements |
|---|---|
| Children | Parental/guardian consent, age-appropriate information |
| Elderly | Accessible information, capacity verification |
| Employees | No workplace coercion, union consultation if applicable |
| Patients | Clinical ethics oversight, medical safeguards |
| Economically dependent | Ensure no exploitation of financial vulnerability |
Law Enforcement and Migration
Article 60(4)(i) provides limited exceptions:
| Requirement | Implementation |
|---|---|
| No negative effect | Testing and its outcomes must not have any negative effect on subjects |
| Personal data deletion | Personal data shall be deleted after the test is performed |
| Testing plan approval | Market surveillance authority must still approve |
| Enhanced safeguards | Greater protections than standard testing |
| Documentation | Comprehensive records for accountability |
Duration and Scope Limits
Proportionality Requirements
| Factor | Consideration |
|---|---|
| Maximum duration | 6 months, extendable to 12 months per Article 60(4)(f) |
| Duration | No longer than necessary to achieve testing objectives |
| Subject numbers | Minimum needed for statistical validity |
| Scope of decisions | Limited to what's necessary to test |
| Geographic scope | Appropriate to testing objectives |
Typical Testing Parameters
| Testing Type | Typical Duration | Typical Scale |
|---|---|---|
| Pilot testing | 1-3 months | 50-500 subjects |
| Extended validation | 3-6 months | 500-5,000 subjects |
| Pre-launch testing | 1-2 months | 1,000-10,000 subjects |
Documentation Requirements
Real-Time Documentation
| Document | Contents | Update Frequency |
|---|---|---|
| Testing log | All testing activities, decisions, events | Continuous |
| Incident register | Issues, near-misses, complaints | As they occur |
| Consent records | All consent documentation | Per subject |
| Performance data | Accuracy, fairness, other metrics | Daily/weekly |
| Monitoring reports | Supervisor observations | Weekly |
Post-Testing Documentation
| Document | Purpose | Retention |
|---|---|---|
| Testing summary report | Overall findings, conclusions | 10 years minimum |
| Performance validation | Evidence for conformity assessment | 10 years minimum |
| Incident summary | All issues and resolutions | 10 years minimum |
| Subject outcomes | What happened to test subjects | As required by GDPR |
Authority Notification and Approval
Approval Process
Ongoing Reporting
| Report Type | Timing | Contents |
|---|---|---|
| Progress reports | Monthly during testing | Activities, metrics, issues |
| Incident reports | Immediately upon occurrence | Details, response, remediation |
| Completion report | End of testing | Summary, findings, recommendations |
Liability and Insurance
Liability Framework (Article 60(9))
| Liability Aspect | Provider Responsibility |
|---|---|
| Harm to test subjects | Provider remains fully liable under applicable Union and national liability law |
| Insurance (recommended) | While not explicitly required by the AI Act, adequate insurance is a prudent practice |
| No liability transfer | Cannot contract away liability to subjects |
Insurance Considerations
| Coverage Type | What It Covers |
|---|---|
| Product liability | Harm caused by the AI system |
| Professional indemnity | Errors in testing design or execution |
| Clinical trials (if applicable) | Medical testing-specific coverage |
| Cyber liability | Data breaches during testing |
Real-World Testing Checklist
Pre-Testing
- Develop comprehensive testing plan
- Identify and assess risks to test subjects
- Design safeguards and monitoring procedures
- Create informed consent materials
- Establish intervention triggers and procedures
- Obtain liability insurance/coverage
- Submit plan to market surveillance authority
- Obtain authority approval
During Testing
- Obtain informed consent from all subjects
- Activate monitoring systems
- Document all activities and decisions
- Submit regular progress reports
- Report incidents immediately
- Maintain human oversight
- Respond to any authority requests
Post-Testing
- Complete testing summary report
- Compile performance validation evidence
- Document all incidents and resolutions
- Notify authority of testing completion
- Retain all documentation (10+ years)
- Use findings in conformity assessment
What You Learned
Key concepts from this chapter
**Real-world testing is permitted** but subject to specific safeguards under Article 60
**Informed consent is mandatory** (with limited exceptions for law enforcement/migration)
**Testing plans must be approved** by market surveillance authorities before testing begins
**Monitoring must be effective** with clear triggers for intervention
**AI decisions must be reversible** or disregardable during testing