Quick Answer
What does the EU AI Act require from data teams before August 2, 2026?
EU AI Act data processing obligations in 2026 apply to organizations that develop or deploy high-risk AI systems — including systems that use CSV data for hiring decisions, credit scoring, healthcare triage, or education assessment — full compliance with Annex III obligations is required by August 2, 2026 for newly deployed systems. Key obligations for data teams include training data quality requirements under Article 10, data governance documentation, GDPR alignment including DPIA for high-risk processing, and Privacy by Design integration from the data pipeline stage.
Grandfathering note: AI systems already placed on the market or put into service before August 2, 2026 have until August 2, 2027 to come into compliance under Article 111(2). New systems deployed on or after August 2, 2026 must be compliant from first deployment.
TL;DR: The EU AI Act's high-risk AI system obligations take effect August 2, 2026. For data teams, the practical obligations concentrate on training data quality (Article 10), GDPR interaction for personal data processing, data minimization as required by AI Act Recital 69, and documentation. If your CSV data feeds AI systems in employment, credit, healthcare, education, or public safety contexts, you are almost certainly in scope.
Your ML team is building a candidate screening system that ingests CSV exports from your ATS — names, experience years, education history, assessment scores. The system ranks candidates automatically. It has been in development for eight months and goes live next quarter.
Under the EU AI Act, AI systems used in employment and workers management that use personal data to make or substantially influence decisions affecting individuals are classified as high-risk under Annex III, Point 4. Your candidate screening system likely qualifies. Under Article 10, your training data must meet specific quality requirements. Under the interaction with GDPR Article 35, a Data Protection Impact Assessment may be mandatory. Under Article 13, the system must be transparent about how it uses data.
The compliance deadline for systems you deploy after August 2, 2026 is August 2, 2026. That is 139 days from the date of this guide. Existing systems have until August 2, 2027 — but building compliant data pipelines for existing systems takes time that starts now.
EU AI Act obligations referenced in this guide were verified against Regulation (EU) 2024/1689 of the European Parliament and of the Council, European Commission published guidance, and EDPB guidance on automated decision-making, March 2026.
Table of Contents
- High-Risk AI Systems: Are You in Scope?
- The August 2, 2026 Deadline: What It Covers
- Article 10: Training Data Quality Requirements
- How the EU AI Act and GDPR Interact for CSV Data
- Data Minimization: AI Act Recital 69
- Practical Data Pipeline Obligations for High-Risk AI
- Additional Resources
- FAQ
This guide is for: Data Protection Officers, AI engineers, data engineers, and compliance teams whose organizations develop or deploy AI systems that process personal data for consequential decisions.
High-Risk AI Systems: Are You in Scope?
EU AI Act Annex III identifies eight categories of high-risk AI systems. For data teams working with CSV files, four categories are most practically relevant.
| Annex III Category | Description | CSV Data Scenarios In Scope |
|---|---|---|
| Point 2: Critical infrastructure | AI that manages or operates critical infrastructure components | Power grid management systems fed by operational CSV exports |
| Point 4: Employment and workers management | AI used for recruitment, selection, promotion, performance monitoring, or task allocation | Candidate screening from ATS CSV exports; performance ranking from HR CSV data |
| Point 5: Education and vocational training | AI that determines access, assesses performance, or monitors behavior of students | Student assessment systems ingesting grade CSVs; adaptive learning platforms |
| Point 6: Access to essential services | AI that evaluates creditworthiness, prices insurance, or makes other essential service decisions | Credit scoring models trained on financial transaction CSV data; insurance pricing systems |
Beyond these four, Point 1 (biometric identification) and Point 7 (law enforcement) apply in specific contexts. The key question for any AI system is: does it make or substantially influence a decision that affects individuals' rights, opportunities, or access to services?
What "substantially influence" means: The AI Act covers systems that are used for profiling individuals for the purposes of consequential decision-making, even where a human makes the final call. A system that ranks candidates and presents the top 10 for human review is substantially influencing the hiring decision — the candidates ranked below 10 receive no human review at all.
The August 2, 2026 Deadline: What It Covers
The EU AI Act (Regulation (EU) 2024/1689) entered into force August 1, 2024. Obligations have been phasing in since then:
| Date | Obligations Taking Effect |
|---|---|
| February 2, 2025 | Prohibited AI practices banned (Article 5) |
| August 2, 2025 | General-purpose AI model obligations; governance bodies active |
| August 2, 2026 | High-risk AI system obligations (Annex III) — full compliance required for new deployments |
| February 2, 2027 | High-risk AI systems in Annex I (product safety regulation) |
| August 2, 2027 | Existing high-risk AI systems already on market must comply |
For systems deployed on or after August 2, 2026: full compliance with all Annex III obligations is required from first deployment. There is no grace period for new systems.
Note on proposed regulatory changes: The European Commission's proposed Digital Omnibus regulation (February 2026) includes provisions that could simplify some SME compliance obligations and adjust certain GDPR interaction points. As of March 2026, the proposal has not been adopted and the AI Act timeline remains unchanged. Monitor the Commission's legislative tracker for updates before finalizing your compliance roadmap — if adopted, some obligations may shift.
Penalties for non-compliance are structured in three tiers. Violations of prohibited AI practices carry up to €35 million or 7% of global annual turnover. Violations of high-risk system obligations carry up to €15 million or 3% of global annual turnover. Providing incorrect information to authorities carries up to €7.5 million or 1.5% of global annual turnover. These figures significantly exceed the maximum GDPR fines for equivalent violations.
Article 10: Training Data Quality Requirements
Article 10 of the EU AI Act establishes specific requirements for the data used to train, validate, and test high-risk AI systems. For data teams, this is where the most direct operational obligations arise.
Article 10(2) requires that training, validation, and testing datasets:
- Be subject to appropriate data governance and management practices — documented processes for data collection, labeling, cleaning, and validation
- Be relevant, sufficiently representative, and to the best extent possible, free of errors given the intended purpose
- Have the appropriate statistical properties for the persons or groups of persons on whom the AI system will be used
- Comply with any applicable legal frameworks — including GDPR where personal data is involved
- Be examined for possible biases that could affect the system's output
Article 10(3) adds that training datasets may contain special categories of personal data where strictly necessary to ensure bias monitoring, detection, and correction — but only with appropriate safeguards.
For CSV training data specifically: If your model is trained on customer CSV exports, employee CSV records, or patient CSV data, Article 10 requires documented processes for how that data was collected, cleaned, labeled, and validated. The documentation must demonstrate that the data is representative of the population the model will affect and that known sources of bias have been identified and addressed.
How the EU AI Act and GDPR Interact for CSV Data
The EU AI Act does not replace GDPR. Both regulations apply simultaneously to AI systems that process personal data. The interaction creates overlapping obligations that data teams must navigate together.
DPIA requirement: GDPR Article 35 requires a Data Protection Impact Assessment for processing likely to result in high risk to individuals. EU AI Act high-risk systems processing personal data almost automatically trigger the DPIA requirement under Article 35(1) — the scale and nature of the decision-making involved typically constitutes "high risk" under the GDPR criteria. If you are deploying a high-risk AI system under the AI Act, assume a GDPR DPIA is also required.
Legal basis for training data: GDPR Article 6 requires a lawful basis for all personal data processing. Using customer CSV data or employee CSV data to train an AI model is processing for a new purpose. If your original legal basis was contract performance (processing customer data to fulfill an order), using that data to train a hiring model is likely a new purpose requiring a new legal basis — typically either consent or legitimate interests after a balancing test.
Data subject rights in AI systems: GDPR Article 22 restricts purely automated decision-making that produces legal or similarly significant effects. High-risk AI systems under the AI Act that make or substantially influence such decisions must provide a mechanism for human review, as required by both Article 22 GDPR and Article 14 of the AI Act. This right must be technically implemented in the system's data pipeline.
Privacy by Design (GDPR Article 25): Building a high-risk AI system without integrating Privacy by Design from the data pipeline stage is inconsistent with both Article 25 GDPR and the spirit of the EU AI Act. Data minimization, purpose limitation, and technical measures to protect personal data must be designed into the training data pipeline, not retrofitted after deployment.
Data Minimization: AI Act Recital 69
EU AI Act Recital 69 explicitly references GDPR data minimization as a principle applicable to AI training data:
"The right to privacy and to protection of personal data must be guaranteed throughout the entire lifecycle of the AI system. In this regard, the principles of data minimisation and data protection by design and by default, as set out in Union data protection law, are applicable when personal data is processed."
This means: the GDPR Article 5(1)(c) data minimization principle applies to AI training data. You should not use more personal data than is strictly necessary for the intended training purpose. A model that only needs behavioral patterns does not require training data that includes names, addresses, and demographic details. Strip those columns before training.
For CSV training data workflows, this has a direct operational implication: before feeding any customer or employee CSV into a training pipeline, apply data minimization. Remove all columns not required for the specific model task. Pseudonymize direct identifiers where possible. This is both a GDPR obligation and an AI Act obligation — and processing that data locally before ingestion keeps it under your control rather than exposing it to additional processors.
Practical Data Pipeline Obligations for High-Risk AI
If your AI system falls under Annex III, these are the data pipeline actions that need to be in place before August 2, 2026 for new deployments.
Step 1: Classify your AI systems. Review every AI system (including internal tools, third-party models you deploy, and fine-tuned models) against Annex III. Document the classification and the reasoning. A system that does not fall under Annex III carries no high-risk obligations under the AI Act.
Step 2: Conduct a GDPR DPIA for personal data processing. If the system processes personal data, a DPIA is almost certainly required. The DPIA must assess the risks of the AI processing, the mitigations in place, and the residual risk to data subjects.
Step 3: Document your training data governance. Article 10 requires documented data governance practices. Produce a data sheet for each training dataset: source, collection method, labeling process, bias analysis, representativeness assessment, and any GDPR legal basis relied upon.
Step 4: Apply data minimization before training. Strip columns not required for the training task. Pseudonymize direct identifiers. Process the data locally where possible to minimize processor exposure. Document the minimization steps as part of your Article 10 data governance documentation.
Step 5: Implement Article 13/14 transparency mechanisms. High-risk AI systems must provide transparency to affected individuals about automated processing affecting them, including information about the logic, the data used, and the right to human review. This requires technical implementation in the data pipeline.
Step 6: Register the system where required. Certain categories of high-risk AI systems must be registered in the EU AI Act's EU database before deployment. Check whether your system category requires registration under Article 49.
AI Act Obligations Beyond Article 10: What Data Teams Need to Know
This post focuses on data governance (Article 10) because that is the data team's primary domain. But high-risk AI systems carry a broader set of obligations. Data teams should understand where their work intersects with obligations owned primarily by legal, engineering, or product.
| Obligation | Primary Article | Who Owns It | Data Team Involvement |
|---|---|---|---|
| Risk management system | Art. 9 | Engineering + Legal | Medium — risk analysis informs data collection and labeling decisions |
| Training data governance | Art. 10 | Data Team | High — primary data team obligation |
| Technical documentation | Art. 11 | Engineering + Legal | Medium — data sheets and lineage documentation are data team outputs |
| Record-keeping / audit logs | Art. 12 | Engineering | Low — infrastructure; data team provides data lineage records |
| Transparency to affected persons | Art. 13 | Product + Legal | Medium — data team documents what data was used and how |
| Human oversight mechanism | Art. 14 | Engineering + Product | Low-Medium — data pipeline must support override and correction capability |
| Accuracy, robustness, cybersecurity | Art. 15 | Engineering | Low — performance metrics; data team ensures training data quality |
| Conformity assessment / CE marking | Art. 43 | Legal + External auditor | Low — data governance documentation is an input to the assessment |
| EU database registration | Art. 49 | Legal | Low — data team provides system description inputs |
| GDPR DPIA (personal data) | GDPR Art. 35 | DPO + Data Team | High — data processing assessment is a data team responsibility |
The data team's core AI Act scope: Articles 10 (training data), 12 (data lineage records), and 15 (data quality for accuracy) are the obligations most directly owned by data teams. Articles 9, 11, 13, 14, and 43 require data team inputs but are primarily owned by other functions. If your organization has a compliance or legal function, share this table to clarify ownership before August 2, 2026.
Conformity assessment note: For most Annex III systems, the conformity assessment (Article 43) is self-assessed rather than requiring a third-party audit. The exception is certain high-risk categories (biometric identification, critical infrastructure) which may require third-party assessment. Data governance documentation produced under Article 10 is a required input to the conformity assessment regardless of which pathway applies.
For training data that is prepared from CSV exports, SplitForge's Data Masking tool applies data minimization and pseudonymization locally — the training data is de-identified before it ever leaves your environment, supporting both your Article 10 data governance requirements and your GDPR Article 25 Privacy by Design obligations.
Additional Resources
Official EU AI Act Text:
- Regulation (EU) 2024/1689 — EU Artificial Intelligence Act — Full official regulation text including Annex III high-risk categories and Article 10 training data requirements
- EU AI Act Annex III — High-risk AI systems list — Enumeration of high-risk AI system categories
GDPR Interaction:
- GDPR Article 35 — Data Protection Impact Assessment — When a DPIA is required; applies to high-risk AI systems processing personal data
- GDPR Article 25 — Privacy by Design and by Default — Technical measure obligations that apply throughout the AI system lifecycle
Supervisory Authority Guidance:
- EDPB Guidelines on Automated Decision-making and Profiling — Article 22 requirements for automated decisions; directly relevant to high-risk AI