Navigated to blog › eu-ai-act-data-processing-2026
Back to Blog
csv-operations

EU AI Act and CSV Data: What Data Teams Must Do Before August 2, 2026

March 16, 2026
16
By SplitForge Team

Quick Answer

What does the EU AI Act require from data teams before August 2, 2026?

EU AI Act data processing obligations in 2026 apply to organizations that develop or deploy high-risk AI systems — including systems that use CSV data for hiring decisions, credit scoring, healthcare triage, or education assessment — full compliance with Annex III obligations is required by August 2, 2026 for newly deployed systems. Key obligations for data teams include training data quality requirements under Article 10, data governance documentation, GDPR alignment including DPIA for high-risk processing, and Privacy by Design integration from the data pipeline stage.

Grandfathering note: AI systems already placed on the market or put into service before August 2, 2026 have until August 2, 2027 to come into compliance under Article 111(2). New systems deployed on or after August 2, 2026 must be compliant from first deployment.


TL;DR: The EU AI Act's high-risk AI system obligations take effect August 2, 2026. For data teams, the practical obligations concentrate on training data quality (Article 10), GDPR interaction for personal data processing, data minimization as required by AI Act Recital 69, and documentation. If your CSV data feeds AI systems in employment, credit, healthcare, education, or public safety contexts, you are almost certainly in scope.


Your ML team is building a candidate screening system that ingests CSV exports from your ATS — names, experience years, education history, assessment scores. The system ranks candidates automatically. It has been in development for eight months and goes live next quarter.

Under the EU AI Act, AI systems used in employment and workers management that use personal data to make or substantially influence decisions affecting individuals are classified as high-risk under Annex III, Point 4. Your candidate screening system likely qualifies. Under Article 10, your training data must meet specific quality requirements. Under the interaction with GDPR Article 35, a Data Protection Impact Assessment may be mandatory. Under Article 13, the system must be transparent about how it uses data.

The compliance deadline for systems you deploy after August 2, 2026 is August 2, 2026. That is 139 days from the date of this guide. Existing systems have until August 2, 2027 — but building compliant data pipelines for existing systems takes time that starts now.

EU AI Act obligations referenced in this guide were verified against Regulation (EU) 2024/1689 of the European Parliament and of the Council, European Commission published guidance, and EDPB guidance on automated decision-making, March 2026.


Table of Contents


This guide is for: Data Protection Officers, AI engineers, data engineers, and compliance teams whose organizations develop or deploy AI systems that process personal data for consequential decisions.


High-Risk AI Systems: Are You in Scope?

EU AI Act Annex III identifies eight categories of high-risk AI systems. For data teams working with CSV files, four categories are most practically relevant.

Annex III CategoryDescriptionCSV Data Scenarios In Scope
Point 2: Critical infrastructureAI that manages or operates critical infrastructure componentsPower grid management systems fed by operational CSV exports
Point 4: Employment and workers managementAI used for recruitment, selection, promotion, performance monitoring, or task allocationCandidate screening from ATS CSV exports; performance ranking from HR CSV data
Point 5: Education and vocational trainingAI that determines access, assesses performance, or monitors behavior of studentsStudent assessment systems ingesting grade CSVs; adaptive learning platforms
Point 6: Access to essential servicesAI that evaluates creditworthiness, prices insurance, or makes other essential service decisionsCredit scoring models trained on financial transaction CSV data; insurance pricing systems

Beyond these four, Point 1 (biometric identification) and Point 7 (law enforcement) apply in specific contexts. The key question for any AI system is: does it make or substantially influence a decision that affects individuals' rights, opportunities, or access to services?

What "substantially influence" means: The AI Act covers systems that are used for profiling individuals for the purposes of consequential decision-making, even where a human makes the final call. A system that ranks candidates and presents the top 10 for human review is substantially influencing the hiring decision — the candidates ranked below 10 receive no human review at all.


The August 2, 2026 Deadline: What It Covers

The EU AI Act (Regulation (EU) 2024/1689) entered into force August 1, 2024. Obligations have been phasing in since then:

DateObligations Taking Effect
February 2, 2025Prohibited AI practices banned (Article 5)
August 2, 2025General-purpose AI model obligations; governance bodies active
August 2, 2026High-risk AI system obligations (Annex III) — full compliance required for new deployments
February 2, 2027High-risk AI systems in Annex I (product safety regulation)
August 2, 2027Existing high-risk AI systems already on market must comply

For systems deployed on or after August 2, 2026: full compliance with all Annex III obligations is required from first deployment. There is no grace period for new systems.

Note on proposed regulatory changes: The European Commission's proposed Digital Omnibus regulation (February 2026) includes provisions that could simplify some SME compliance obligations and adjust certain GDPR interaction points. As of March 2026, the proposal has not been adopted and the AI Act timeline remains unchanged. Monitor the Commission's legislative tracker for updates before finalizing your compliance roadmap — if adopted, some obligations may shift.

Penalties for non-compliance are structured in three tiers. Violations of prohibited AI practices carry up to €35 million or 7% of global annual turnover. Violations of high-risk system obligations carry up to €15 million or 3% of global annual turnover. Providing incorrect information to authorities carries up to €7.5 million or 1.5% of global annual turnover. These figures significantly exceed the maximum GDPR fines for equivalent violations.


Article 10: Training Data Quality Requirements

Article 10 of the EU AI Act establishes specific requirements for the data used to train, validate, and test high-risk AI systems. For data teams, this is where the most direct operational obligations arise.

Article 10(2) requires that training, validation, and testing datasets:

  1. Be subject to appropriate data governance and management practices — documented processes for data collection, labeling, cleaning, and validation
  2. Be relevant, sufficiently representative, and to the best extent possible, free of errors given the intended purpose
  3. Have the appropriate statistical properties for the persons or groups of persons on whom the AI system will be used
  4. Comply with any applicable legal frameworks — including GDPR where personal data is involved
  5. Be examined for possible biases that could affect the system's output

Article 10(3) adds that training datasets may contain special categories of personal data where strictly necessary to ensure bias monitoring, detection, and correction — but only with appropriate safeguards.

For CSV training data specifically: If your model is trained on customer CSV exports, employee CSV records, or patient CSV data, Article 10 requires documented processes for how that data was collected, cleaned, labeled, and validated. The documentation must demonstrate that the data is representative of the population the model will affect and that known sources of bias have been identified and addressed.


How the EU AI Act and GDPR Interact for CSV Data

The EU AI Act does not replace GDPR. Both regulations apply simultaneously to AI systems that process personal data. The interaction creates overlapping obligations that data teams must navigate together.

DPIA requirement: GDPR Article 35 requires a Data Protection Impact Assessment for processing likely to result in high risk to individuals. EU AI Act high-risk systems processing personal data almost automatically trigger the DPIA requirement under Article 35(1) — the scale and nature of the decision-making involved typically constitutes "high risk" under the GDPR criteria. If you are deploying a high-risk AI system under the AI Act, assume a GDPR DPIA is also required.

Legal basis for training data: GDPR Article 6 requires a lawful basis for all personal data processing. Using customer CSV data or employee CSV data to train an AI model is processing for a new purpose. If your original legal basis was contract performance (processing customer data to fulfill an order), using that data to train a hiring model is likely a new purpose requiring a new legal basis — typically either consent or legitimate interests after a balancing test.

Data subject rights in AI systems: GDPR Article 22 restricts purely automated decision-making that produces legal or similarly significant effects. High-risk AI systems under the AI Act that make or substantially influence such decisions must provide a mechanism for human review, as required by both Article 22 GDPR and Article 14 of the AI Act. This right must be technically implemented in the system's data pipeline.

Privacy by Design (GDPR Article 25): Building a high-risk AI system without integrating Privacy by Design from the data pipeline stage is inconsistent with both Article 25 GDPR and the spirit of the EU AI Act. Data minimization, purpose limitation, and technical measures to protect personal data must be designed into the training data pipeline, not retrofitted after deployment.


Data Minimization: AI Act Recital 69

EU AI Act Recital 69 explicitly references GDPR data minimization as a principle applicable to AI training data:

"The right to privacy and to protection of personal data must be guaranteed throughout the entire lifecycle of the AI system. In this regard, the principles of data minimisation and data protection by design and by default, as set out in Union data protection law, are applicable when personal data is processed."

This means: the GDPR Article 5(1)(c) data minimization principle applies to AI training data. You should not use more personal data than is strictly necessary for the intended training purpose. A model that only needs behavioral patterns does not require training data that includes names, addresses, and demographic details. Strip those columns before training.

For CSV training data workflows, this has a direct operational implication: before feeding any customer or employee CSV into a training pipeline, apply data minimization. Remove all columns not required for the specific model task. Pseudonymize direct identifiers where possible. This is both a GDPR obligation and an AI Act obligation — and processing that data locally before ingestion keeps it under your control rather than exposing it to additional processors.


Practical Data Pipeline Obligations for High-Risk AI

If your AI system falls under Annex III, these are the data pipeline actions that need to be in place before August 2, 2026 for new deployments.

Step 1: Classify your AI systems. Review every AI system (including internal tools, third-party models you deploy, and fine-tuned models) against Annex III. Document the classification and the reasoning. A system that does not fall under Annex III carries no high-risk obligations under the AI Act.

Step 2: Conduct a GDPR DPIA for personal data processing. If the system processes personal data, a DPIA is almost certainly required. The DPIA must assess the risks of the AI processing, the mitigations in place, and the residual risk to data subjects.

Step 3: Document your training data governance. Article 10 requires documented data governance practices. Produce a data sheet for each training dataset: source, collection method, labeling process, bias analysis, representativeness assessment, and any GDPR legal basis relied upon.

Step 4: Apply data minimization before training. Strip columns not required for the training task. Pseudonymize direct identifiers. Process the data locally where possible to minimize processor exposure. Document the minimization steps as part of your Article 10 data governance documentation.

Step 5: Implement Article 13/14 transparency mechanisms. High-risk AI systems must provide transparency to affected individuals about automated processing affecting them, including information about the logic, the data used, and the right to human review. This requires technical implementation in the data pipeline.

Step 6: Register the system where required. Certain categories of high-risk AI systems must be registered in the EU AI Act's EU database before deployment. Check whether your system category requires registration under Article 49.

AI Act Obligations Beyond Article 10: What Data Teams Need to Know

This post focuses on data governance (Article 10) because that is the data team's primary domain. But high-risk AI systems carry a broader set of obligations. Data teams should understand where their work intersects with obligations owned primarily by legal, engineering, or product.

ObligationPrimary ArticleWho Owns ItData Team Involvement
Risk management systemArt. 9Engineering + LegalMedium — risk analysis informs data collection and labeling decisions
Training data governanceArt. 10Data TeamHigh — primary data team obligation
Technical documentationArt. 11Engineering + LegalMedium — data sheets and lineage documentation are data team outputs
Record-keeping / audit logsArt. 12EngineeringLow — infrastructure; data team provides data lineage records
Transparency to affected personsArt. 13Product + LegalMedium — data team documents what data was used and how
Human oversight mechanismArt. 14Engineering + ProductLow-Medium — data pipeline must support override and correction capability
Accuracy, robustness, cybersecurityArt. 15EngineeringLow — performance metrics; data team ensures training data quality
Conformity assessment / CE markingArt. 43Legal + External auditorLow — data governance documentation is an input to the assessment
EU database registrationArt. 49LegalLow — data team provides system description inputs
GDPR DPIA (personal data)GDPR Art. 35DPO + Data TeamHigh — data processing assessment is a data team responsibility

The data team's core AI Act scope: Articles 10 (training data), 12 (data lineage records), and 15 (data quality for accuracy) are the obligations most directly owned by data teams. Articles 9, 11, 13, 14, and 43 require data team inputs but are primarily owned by other functions. If your organization has a compliance or legal function, share this table to clarify ownership before August 2, 2026.

Conformity assessment note: For most Annex III systems, the conformity assessment (Article 43) is self-assessed rather than requiring a third-party audit. The exception is certain high-risk categories (biometric identification, critical infrastructure) which may require third-party assessment. Data governance documentation produced under Article 10 is a required input to the conformity assessment regardless of which pathway applies.

For training data that is prepared from CSV exports, SplitForge's Data Masking tool applies data minimization and pseudonymization locally — the training data is de-identified before it ever leaves your environment, supporting both your Article 10 data governance requirements and your GDPR Article 25 Privacy by Design obligations.


Additional Resources

Official EU AI Act Text:

GDPR Interaction:

Supervisory Authority Guidance:


FAQ

Annex III identifies eight categories. For data teams, the most relevant are: employment and workers management AI (recruitment, selection, performance monitoring), access to essential services AI (creditworthiness assessment, insurance pricing), education AI (student assessment, access determination), and critical infrastructure management AI. A system is high-risk if it falls in these categories and makes or substantially influences consequential decisions affecting individuals.

Yes, if the system is deployed in the EU or if its outputs are used in the EU. The territorial scope (Article 2) mirrors GDPR's extraterritorial reach. A US-based company deploying a candidate screening AI to evaluate EU candidates, or a credit scoring model that affects EU residents' access to financial services, is within scope regardless of where the AI was developed.

Under Article 111(2), AI systems already placed on the market or put into service before August 2, 2026 have until August 2, 2027 to achieve compliance with Annex III obligations. This does not mean compliance work can wait — achieving compliance for existing systems requires designing and implementing training data governance, DPIA processes, and transparency mechanisms that take significant time to implement properly.

GDPR compliance is necessary but not sufficient for Article 10. Article 10 adds requirements beyond GDPR: the data must be sufficiently representative for the population the system will affect, documented for bias analysis, and appropriate for the statistical properties required by the intended use. A dataset that satisfies GDPR legal basis requirements may still fail Article 10 if it systematically underrepresents certain demographic groups or contains labeling errors that introduce bias.

Using existing customer data for a new AI training purpose is a new processing purpose under GDPR Article 5(1)(b) purpose limitation. The legal basis you used to collect the original data (contract performance, consent, legitimate interests) may not extend to AI training. You need either explicit consent from data subjects for the AI training purpose, or a legitimate interests assessment that demonstrates the training purpose is compatible with the original collection purpose. Additionally, EU AI Act Article 10 requires that the training data be representative and appropriate for the model's intended use — not just legally available.

Article 3(14) of the AI Act defines an AI system as being used for decision-making when it is used for profiling of natural persons, generating content, making recommendations, decisions, or predictions with the potential to influence real-world actions or decisions. A system that ranks candidates and presents a filtered list for human review substantially influences hiring decisions — candidates below the filter receive no consideration. A system that generates risk scores used in underwriting substantially influences credit decisions even if a human signs off.



Legal disclaimer: The content in this post is for informational purposes only and does not constitute legal advice. EU AI Act compliance requirements are subject to further guidance from supervisory authorities. Consult qualified legal counsel to assess your specific AI systems against Annex III and applicable national implementation measures.

Prepare Your CSV Training Data for AI Act Compliance

Apply data minimization to CSV exports before AI training — locally, without server transmission
Pseudonymize direct identifiers in training datasets supporting Article 10 data governance
Process training data in your browser — reduce processor exposure under GDPR Article 28
Document de-identification steps as part of your Article 10 data sheet

Continue Reading

More guides to help you work smarter with your data

ai-data-prep

AI-Ready Data Checklist: 10 Things to Verify Before Upload (2026)

Before uploading to ChatGPT, Claude, or a fine-tuning API, run through this 10-point checklist. UTF-8 encoding, clean headers, PII removed, size within limits.

Read More
ai-data-prep

Convert Excel to JSON for AI APIs and LLM Pipelines (2026)

AI APIs and LLM pipelines expect JSON, not spreadsheets. Fine-tuning needs JSONL; direct prompts take arrays. Convert locally — no upload, no conversion server.

Read More
ai-data-prep

Prepare Data for AI: The Complete Guide (Privacy-First, 2026)

How to prepare a CSV or Excel file for ChatGPT, Claude, or an AI API — encoding, PII, format, size, and privacy. The complete local-first prep workflow.

Read More