Accuracy, Robustness, and Cybersecurity
Article 15 technical performance requirements.
Accuracy, Robustness, and Cybersecurity (Article 15)
Learning Objectives
By the end of this chapter, you will be able to:
- Define accuracy metrics appropriate for different AI system types
- Design robustness strategies including redundancy and fail-safe mechanisms
- Identify and mitigate AI-specific cybersecurity threats
- Implement testing frameworks for ongoing performance validation
- Document accuracy levels in compliance with disclosure requirements
Article 15 establishes the technical performance requirements that ensure high-risk AI systems operate reliably, safely, and securely throughout their lifecycle. These requirements form the technical foundation of AI Act compliance.
Understanding Article 15 Requirements
Article 15(1) states: "High-risk AI systems shall be designed and developed in such a way that they achieve an appropriate level of accuracy, robustness, and cybersecurity, and that they perform consistently in those respects throughout their lifecycle."
This creates a continuous obligation—performance must be maintained throughout the AI system's operational life, not just at the point of market placement.
Accuracy Requirements in Detail
What is "Appropriate Accuracy"?
The AI Act deliberately avoids prescribing specific accuracy thresholds because appropriate levels depend on:
| Factor | Consideration |
|---|---|
| Intended Purpose | A diagnostic AI requires different accuracy than a recommendation system |
| Affected Persons | Higher accuracy for systems affecting fundamental rights |
| Risk Severity | Life-critical applications demand higher performance |
| State of the Art | Comparison with technically achievable standards |
| Reasonably Foreseeable Use | Including potential misuse scenarios |
Required Accuracy Disclosures (Article 15(3))
Providers must declare in instructions for use:
| Disclosure Element | Description | Example |
|---|---|---|
| Accuracy Level | Overall performance metric | "95% precision on test dataset" |
| Accuracy Metrics | Specific measures used | Precision, recall, F1, AUC-ROC |
| Conditions* | Operating conditions for stated accuracy | "Under standard lighting conditions" |
| Target Groups* | Any performance variations by demographic | "Accuracy may vary across age groups" |
| Known Limitations* | Scenarios where accuracy degrades | "Reduced accuracy in low-light environments" |
*Note: Conditions, Target Groups, and Known Limitations are not explicitly required by Article 15(3), which mandates only accuracy levels and metrics. These additional elements derive from Article 13(3)(b)(ii) (transparency/instructions for use) and are recommended best practice.
Compliance Note
Many AI systems show significant accuracy disparities across demographic groups. Disclosure of known disparities and their implications for affected persons is best practice for Article 15 compliance and may be required under Articles 10 (data governance, bias examination) and 13 (transparency obligations).
Accuracy Metrics by AI System Type
| AI Application | Primary Metrics | Secondary Metrics |
|---|---|---|
| Classification (e.g., hiring screening) | Precision, Recall, F1-Score | False Positive Rate, False Negative Rate |
| Biometric Identification | True Accept Rate, False Accept Rate | False Reject Rate at threshold |
| Prediction (e.g., credit scoring) | AUC-ROC, Calibration | Brier Score, Log Loss |
| Detection | Mean Average Precision | Intersection over Union |
| Regression | Mean Absolute Error, RMSE | R-squared, Prediction Intervals |
Robustness Requirements (Article 15(4))
The Robustness Mandate
Article 15(4): AI systems must be resilient to errors, faults, or inconsistencies that may occur within the system or its operating environment, particularly due to interaction with natural persons or other systems. Technical redundancy solutions—including backup or fail-safe plans—may be appropriate.
Types of Robustness
| Robustness Type | Description | Implementation Examples |
|---|---|---|
| Input Robustness | Handles unexpected or noisy inputs | Input validation, data preprocessing, anomaly detection |
| Distributional Robustness | Performs on data differing from training distribution | Domain adaptation, robust optimisation |
| Adversarial Robustness | Resists deliberately manipulated inputs | Adversarial training, certified defences |
| Operational Robustness | Functions despite system changes | Version control, model versioning, A/B testing |
| Environmental Robustness | Adapts to operational environment changes | Sensor fusion, environmental monitoring |
Designing Fail-Safe Mechanisms
Fail-Safe Hierarchy for High-Risk AI:
- Graceful Degradation — Reduced functionality rather than complete failure
- Safe State Fallback — Return to known-safe operating state
- Human Handover — Transfer control to human operator
- Emergency Shutdown — Cease operation entirely
- Alert Generation — Notify relevant parties of failure
Fail-Safe Implementation Checklist:
- Define safe states for all operating modes
- Implement continuous self-monitoring
- Create automatic fallback triggers
- Design manual override capabilities
- Test fail-safe activation under realistic conditions
- Document fail-safe procedures in instructions for use
Expert Insight
The best fail-safe systems assume failure is inevitable and design for graceful degradation. Article 15(4) explicitly contemplates "backup or fail-safe plans" as appropriate technical solutions.
Cybersecurity Requirements (Article 15(5))
AI-Specific Attack Vectors
Article 15(5) specifically identifies threats unique to AI systems:
| Attack Type | Description | Mitigation Strategies |
|---|---|---|
| Data Poisoning | Corrupting training data to embed malicious behaviour | Data provenance tracking, anomaly detection, clean data validation |
| Model Poisoning | Directly manipulating model parameters | Secure model storage, integrity verification, access controls |
| Adversarial Examples | Crafted inputs causing misclassification | Adversarial training, input preprocessing, ensemble defences |
| Model Evasion | Inputs designed to bypass detection | Robust feature extraction, multiple detection methods |
| Confidentiality Attacks | Extracting training data or model details | Differential privacy, membership inference defences |
| Model Inversion | Reconstructing sensitive training data | Output perturbation, query limiting |
| Model Extraction | Stealing model architecture/weights | Query monitoring, watermarking, rate limiting |
Cybersecurity Assessment Framework
Pre-Deployment Security Testing:
| Test Type | Purpose | Frequency |
|---|---|---|
| Vulnerability Assessment | Identify known vulnerabilities | Before deployment, after major updates |
| Penetration Testing | Simulate real-world attacks | Quarterly or after significant changes |
| Adversarial Testing | Test AI-specific attack resistance | Before deployment, continuous monitoring |
| Red Team Exercises | Comprehensive attack simulation | Annually for critical systems |
| Security Audit | Review security controls and processes | Annually, or as required |
Integration with Cyber Resilience Act
The AI Act's cybersecurity requirements align with the Cyber Resilience Act (CRA). For AI systems with digital elements:
- CRA establishes baseline cybersecurity requirements for products
- AI Act adds AI-specific threat coverage
- Harmonised standards will provide implementation details
- Conformity with CRA cybersecurity requirements contributes to AI Act compliance
Compliance Note
Article 15(1) requires appropriate levels of accuracy, robustness, and cybersecurity "throughout their lifecycle"—as applied to the cybersecurity requirements of Article 15(5), security is not a one-time assessment but an ongoing obligation.
Performance Testing and Validation
Continuous Performance Monitoring Framework
| Monitoring Aspect | Metrics to Track | Alert Thresholds |
|---|---|---|
| Accuracy Drift | Deviation from baseline performance | >5% degradation triggers review |
| Data Drift | Input distribution changes | Statistical significance tests |
| Concept Drift | Relationship changes between inputs and outputs | Performance on holdout sets |
| Prediction Stability | Consistency across similar inputs | Variance monitoring |
| Response Time | Latency and throughput | Service level agreement breaches |
Testing Protocol
Recommended Testing Cadence:
| Test | Frequency | Documentation Required |
|---|---|---|
| Performance validation | Monthly minimum | Test results, datasets used |
| Robustness testing | Quarterly | Test scenarios, failure modes |
| Security assessment | Quarterly | Vulnerability scan results |
| Full adversarial evaluation | Semi-annually | Attack scenarios, defences tested |
| Third-party audit | Annually | Audit report, remediation plan |
Documentation Requirements for Article 15
Technical documentation must include:
| Document Element | Content | Reference |
|---|---|---|
| Accuracy Specification | Metrics, levels, validation methodology | Annex IV, Section 2(g) |
| Robustness Analysis | Failure modes, mitigations, test results | Annex IV, Section 2(h) |
| Cybersecurity Assessment | Threat analysis, controls, test results | Annex IV, Section 2(i) |
| Validation Records | Test data, procedures, results | Annex IV, Section 3 |
| Performance Monitoring | Ongoing monitoring methodology | Annex IV, Section 4 |
Compliance Checklist: Article 15
Accuracy:
- Accuracy metrics defined and measured
- Performance validated on representative data
- Accuracy levels declared in instructions for use
- Known limitations documented
- Demographic performance variations analysed
Robustness:
- Input validation implemented
- Fail-safe mechanisms designed
- Graceful degradation pathways defined
- Robustness testing completed
- Redundancy measures where appropriate
Cybersecurity:
- AI-specific threats identified and assessed
- Data poisoning defences implemented
- Adversarial robustness tested
- Security monitoring in place
- Incident response procedures defined
What You Learned
Key concepts from this chapter
Accuracy must be appropriate to the intended purpose and declared in instructions for use
Robustness includes technical redundancy, fail-safe mechanisms, and resilience to errors
Cybersecurity requirements address AI-specific attack vectors including data poisoning and adversarial examples
Performance monitoring is a continuous lifecycle obligation, not a one-time assessment
Documentation must demonstrate ongoing compliance with all Article 15 requirements