csv-operations

EU AI Act and CSV Data: What Data Teams Must Do Before August 2, 2026

March 16, 2026

By SplitForge Team

Quick Answer

What does the EU AI Act require from data teams before August 2, 2026?

EU AI Act data processing obligations in 2026 apply to organizations that develop or deploy high-risk AI systems — including systems that use CSV data for hiring decisions, credit scoring, healthcare triage, or education assessment — full compliance with Annex III obligations is required by August 2, 2026 for newly deployed systems. Key obligations for data teams include training data quality requirements under Article 10, data governance documentation, GDPR alignment including DPIA for high-risk processing, and Privacy by Design integration from the data pipeline stage.

Grandfathering note: AI systems already placed on the market or put into service before August 2, 2026 have until August 2, 2027 to come into compliance under Article 111(2). New systems deployed on or after August 2, 2026 must be compliant from first deployment.

TL;DR: The EU AI Act's high-risk AI system obligations take effect August 2, 2026. For data teams, the practical obligations concentrate on training data quality (Article 10), GDPR interaction for personal data processing, data minimization as required by AI Act Recital 69, and documentation. If your CSV data feeds AI systems in employment, credit, healthcare, education, or public safety contexts, you are almost certainly in scope.

Your ML team is building a candidate screening system that ingests CSV exports from your ATS — names, experience years, education history, assessment scores. The system ranks candidates automatically. It has been in development for eight months and goes live next quarter.

Under the EU AI Act, AI systems used in employment and workers management that use personal data to make or substantially influence decisions affecting individuals are classified as high-risk under Annex III, Point 4. Your candidate screening system likely qualifies. Under Article 10, your training data must meet specific quality requirements. Under the interaction with GDPR Article 35, a Data Protection Impact Assessment may be mandatory. Under Article 13, the system must be transparent about how it uses data.

The compliance deadline for systems you deploy after August 2, 2026 is August 2, 2026. That is 139 days from the date of this guide. Existing systems have until August 2, 2027 — but building compliant data pipelines for existing systems takes time that starts now.

EU AI Act obligations referenced in this guide were verified against Regulation (EU) 2024/1689 of the European Parliament and of the Council, European Commission published guidance, and EDPB guidance on automated decision-making, March 2026.

High-Risk AI Systems: Are You in Scope?
The August 2, 2026 Deadline: What It Covers
Article 10: Training Data Quality Requirements
How the EU AI Act and GDPR Interact for CSV Data
Data Minimization: AI Act Recital 69
Practical Data Pipeline Obligations for High-Risk AI
Additional Resources
FAQ

This guide is for: Data Protection Officers, AI engineers, data engineers, and compliance teams whose organizations develop or deploy AI systems that process personal data for consequential decisions.

High-Risk AI Systems: Are You in Scope?

EU AI Act Annex III identifies eight categories of high-risk AI systems. For data teams working with CSV files, four categories are most practically relevant.

Annex III Category	Description	CSV Data Scenarios In Scope
Point 2: Critical infrastructure	AI that manages or operates critical infrastructure components	Power grid management systems fed by operational CSV exports
Point 4: Employment and workers management	AI used for recruitment, selection, promotion, performance monitoring, or task allocation	Candidate screening from ATS CSV exports; performance ranking from HR CSV data
Point 5: Education and vocational training	AI that determines access, assesses performance, or monitors behavior of students	Student assessment systems ingesting grade CSVs; adaptive learning platforms
Point 6: Access to essential services	AI that evaluates creditworthiness, prices insurance, or makes other essential service decisions	Credit scoring models trained on financial transaction CSV data; insurance pricing systems

Beyond these four, Point 1 (biometric identification) and Point 7 (law enforcement) apply in specific contexts. The key question for any AI system is: does it make or substantially influence a decision that affects individuals' rights, opportunities, or access to services?

What "substantially influence" means: The AI Act covers systems that are used for profiling individuals for the purposes of consequential decision-making, even where a human makes the final call. A system that ranks candidates and presents the top 10 for human review is substantially influencing the hiring decision — the candidates ranked below 10 receive no human review at all.

The August 2, 2026 Deadline: What It Covers

The EU AI Act (Regulation (EU) 2024/1689) entered into force August 1, 2024. Obligations have been phasing in since then:

Date	Obligations Taking Effect
February 2, 2025	Prohibited AI practices banned (Article 5)
August 2, 2025	General-purpose AI model obligations; governance bodies active
August 2, 2026	High-risk AI system obligations (Annex III) — full compliance required for new deployments
February 2, 2027	High-risk AI systems in Annex I (product safety regulation)
August 2, 2027	Existing high-risk AI systems already on market must comply

For systems deployed on or after August 2, 2026: full compliance with all Annex III obligations is required from first deployment. There is no grace period for new systems.

Note on proposed regulatory changes: The European Commission's proposed Digital Omnibus regulation (February 2026) includes provisions that could simplify some SME compliance obligations and adjust certain GDPR interaction points. As of March 2026, the proposal has not been adopted and the AI Act timeline remains unchanged. Monitor the Commission's legislative tracker for updates before finalizing your compliance roadmap — if adopted, some obligations may shift.

Penalties for non-compliance are structured in three tiers. Violations of prohibited AI practices carry up to €35 million or 7% of global annual turnover. Violations of high-risk system obligations carry up to €15 million or 3% of global annual turnover. Providing incorrect information to authorities carries up to €7.5 million or 1.5% of global annual turnover. These figures significantly exceed the maximum GDPR fines for equivalent violations.

Article 10: Training Data Quality Requirements

Article 10 of the EU AI Act establishes specific requirements for the data used to train, validate, and test high-risk AI systems. For data teams, this is where the most direct operational obligations arise.

Article 10(2) requires that training, validation, and testing datasets:

Be subject to appropriate data governance and management practices — documented processes for data collection, labeling, cleaning, and validation
Be relevant, sufficiently representative, and to the best extent possible, free of errors given the intended purpose
Have the appropriate statistical properties for the persons or groups of persons on whom the AI system will be used
Comply with any applicable legal frameworks — including GDPR where personal data is involved
Be examined for possible biases that could affect the system's output

Article 10(3) adds that training datasets may contain special categories of personal data where strictly necessary to ensure bias monitoring, detection, and correction — but only with appropriate safeguards.

For CSV training data specifically: If your model is trained on customer CSV exports, employee CSV records, or patient CSV data, Article 10 requires documented processes for how that data was collected, cleaned, labeled, and validated. The documentation must demonstrate that the data is representative of the population the model will affect and that known sources of bias have been identified and addressed.

The EU AI Act does not replace GDPR. Both regulations apply simultaneously to AI systems that process personal data. The interaction creates overlapping obligations that data teams must navigate together.

DPIA requirement: GDPR Article 35 requires a Data Protection Impact Assessment for processing likely to result in high risk to individuals. EU AI Act high-risk systems processing personal data almost automatically trigger the DPIA requirement under Article 35(1) — the scale and nature of the decision-making involved typically constitutes "high risk" under the GDPR criteria. If you are deploying a high-risk AI system under the AI Act, assume a GDPR DPIA is also required.

Legal basis for training data: GDPR Article 6 requires a lawful basis for all personal data processing. Using customer CSV data or employee CSV data to train an AI model is processing for a new purpose. If your original legal basis was contract performance (processing customer data to fulfill an order), using that data to train a hiring model is likely a new purpose requiring a new legal basis — typically either consent or legitimate interests after a balancing test.

Data subject rights in AI systems: GDPR Article 22 restricts purely automated decision-making that produces legal or similarly significant effects. High-risk AI systems under the AI Act that make or substantially influence such decisions must provide a mechanism for human review, as required by both Article 22 GDPR and Article 14 of the AI Act. This right must be technically implemented in the system's data pipeline.

Privacy by Design (GDPR Article 25): Building a high-risk AI system without integrating Privacy by Design from the data pipeline stage is inconsistent with both Article 25 GDPR and the spirit of the EU AI Act. Data minimization, purpose limitation, and technical measures to protect personal data must be designed into the training data pipeline, not retrofitted after deployment.

Data Minimization: AI Act Recital 69

EU AI Act Recital 69 explicitly references GDPR data minimization as a principle applicable to AI training data:

"The right to privacy and to protection of personal data must be guaranteed throughout the entire lifecycle of the AI system. In this regard, the principles of data minimisation and data protection by design and by default, as set out in Union data protection law, are applicable when personal data is processed."

This means: the GDPR Article 5(1)(c) data minimization principle applies to AI training data. You should not use more personal data than is strictly necessary for the intended training purpose. A model that only needs behavioral patterns does not require training data that includes names, addresses, and demographic details. Strip those columns before training.

For CSV training data workflows, this has a direct operational implication: before feeding any customer or employee CSV into a training pipeline, apply data minimization. Remove all columns not required for the specific model task. Pseudonymize direct identifiers where possible. This is both a GDPR obligation and an AI Act obligation — and processing that data locally before ingestion keeps it under your control rather than exposing it to additional processors.

Practical Data Pipeline Obligations for High-Risk AI

If your AI system falls under Annex III, these are the data pipeline actions that need to be in place before August 2, 2026 for new deployments.

Step 1: Classify your AI systems. Review every AI system (including internal tools, third-party models you deploy, and fine-tuned models) against Annex III. Document the classification and the reasoning. A system that does not fall under Annex III carries no high-risk obligations under the AI Act.

Step 2: Conduct a GDPR DPIA for personal data processing. If the system processes personal data, a DPIA is almost certainly required. The DPIA must assess the risks of the AI processing, the mitigations in place, and the residual risk to data subjects.

Step 3: Document your training data governance. Article 10 requires documented data governance practices. Produce a data sheet for each training dataset: source, collection method, labeling process, bias analysis, representativeness assessment, and any GDPR legal basis relied upon.

Step 4: Apply data minimization before training. Strip columns not required for the training task. Pseudonymize direct identifiers. Process the data locally where possible to minimize processor exposure. Document the minimization steps as part of your Article 10 data governance documentation.

Step 5: Implement Article 13/14 transparency mechanisms. High-risk AI systems must provide transparency to affected individuals about automated processing affecting them, including information about the logic, the data used, and the right to human review. This requires technical implementation in the data pipeline.

Step 6: Register the system where required. Certain categories of high-risk AI systems must be registered in the EU AI Act's EU database before deployment. Check whether your system category requires registration under Article 49.

AI Act Obligations Beyond Article 10: What Data Teams Need to Know

This post focuses on data governance (Article 10) because that is the data team's primary domain. But high-risk AI systems carry a broader set of obligations. Data teams should understand where their work intersects with obligations owned primarily by legal, engineering, or product.

Obligation	Primary Article	Who Owns It	Data Team Involvement
Risk management system	Art. 9	Engineering + Legal	Medium — risk analysis informs data collection and labeling decisions
Training data governance	Art. 10	Data Team	High — primary data team obligation
Technical documentation	Art. 11	Engineering + Legal	Medium — data sheets and lineage documentation are data team outputs
Record-keeping / audit logs	Art. 12	Engineering	Low — infrastructure; data team provides data lineage records
Transparency to affected persons	Art. 13	Product + Legal	Medium — data team documents what data was used and how
Human oversight mechanism	Art. 14	Engineering + Product	Low-Medium — data pipeline must support override and correction capability
Accuracy, robustness, cybersecurity	Art. 15	Engineering	Low — performance metrics; data team ensures training data quality
Conformity assessment / CE marking	Art. 43	Legal + External auditor	Low — data governance documentation is an input to the assessment
EU database registration	Art. 49	Legal	Low — data team provides system description inputs
GDPR DPIA (personal data)	GDPR Art. 35	DPO + Data Team	High — data processing assessment is a data team responsibility

The data team's core AI Act scope: Articles 10 (training data), 12 (data lineage records), and 15 (data quality for accuracy) are the obligations most directly owned by data teams. Articles 9, 11, 13, 14, and 43 require data team inputs but are primarily owned by other functions. If your organization has a compliance or legal function, share this table to clarify ownership before August 2, 2026.

Conformity assessment note: For most Annex III systems, the conformity assessment (Article 43) is self-assessed rather than requiring a third-party audit. The exception is certain high-risk categories (biometric identification, critical infrastructure) which may require third-party assessment. Data governance documentation produced under Article 10 is a required input to the conformity assessment regardless of which pathway applies.

For training data that is prepared from CSV exports, SplitForge's Data Masking tool applies data minimization and pseudonymization locally — the training data is de-identified before it ever leaves your environment, supporting both your Article 10 data governance requirements and your GDPR Article 25 Privacy by Design obligations.

Additional Resources

Official EU AI Act Text:

Regulation (EU) 2024/1689 — EU Artificial Intelligence Act — Full official regulation text including Annex III high-risk categories and Article 10 training data requirements
EU AI Act Annex III — High-risk AI systems list — Enumeration of high-risk AI system categories

GDPR Interaction:

GDPR Article 35 — Data Protection Impact Assessment — When a DPIA is required; applies to high-risk AI systems processing personal data
GDPR Article 25 — Privacy by Design and by Default — Technical measure obligations that apply throughout the AI system lifecycle

Supervisory Authority Guidance:

EDPB Guidelines on Automated Decision-making and Profiling — Article 22 requirements for automated decisions; directly relevant to high-risk AI

FAQ

Annex III identifies eight categories. For data teams, the most relevant are: employment and workers management AI (recruitment, selection, performance monitoring), access to essential services AI (creditworthiness assessment, insurance pricing), education AI (student assessment, access determination), and critical infrastructure management AI. A system is high-risk if it falls in these categories and makes or substantially influences consequential decisions affecting individuals.

Yes, if the system is deployed in the EU or if its outputs are used in the EU. The territorial scope (Article 2) mirrors GDPR's extraterritorial reach. A US-based company deploying a candidate screening AI to evaluate EU candidates, or a credit scoring model that affects EU residents' access to financial services, is within scope regardless of where the AI was developed.

Under Article 111(2), AI systems already placed on the market or put into service before August 2, 2026 have until August 2, 2027 to achieve compliance with Annex III obligations. This does not mean compliance work can wait — achieving compliance for existing systems requires designing and implementing training data governance, DPIA processes, and transparency mechanisms that take significant time to implement properly.

GDPR compliance is necessary but not sufficient for Article 10. Article 10 adds requirements beyond GDPR: the data must be sufficiently representative for the population the system will affect, documented for bias analysis, and appropriate for the statistical properties required by the intended use. A dataset that satisfies GDPR legal basis requirements may still fail Article 10 if it systematically underrepresents certain demographic groups or contains labeling errors that introduce bias.

Using existing customer data for a new AI training purpose is a new processing purpose under GDPR Article 5(1)(b) purpose limitation. The legal basis you used to collect the original data (contract performance, consent, legitimate interests) may not extend to AI training. You need either explicit consent from data subjects for the AI training purpose, or a legitimate interests assessment that demonstrates the training purpose is compatible with the original collection purpose. Additionally, EU AI Act Article 10 requires that the training data be representative and appropriate for the model's intended use — not just legally available.

Article 3(14) of the AI Act defines an AI system as being used for decision-making when it is used for profiling of natural persons, generating content, making recommendations, decisions, or predictions with the potential to influence real-world actions or decisions. A system that ranks candidates and presents a filtered list for human review substantially influences hiring decisions — candidates below the filter receive no consideration. A system that generates risk scores used in underwriting substantially influences credit decisions even if a human signs off.

Legal disclaimer: The content in this post is for informational purposes only and does not constitute legal advice. EU AI Act compliance requirements are subject to further guidance from supervisory authorities. Consult qualified legal counsel to assess your specific AI systems against Annex III and applicable national implementation measures.

Prepare Your CSV Training Data for AI Act Compliance

Apply data minimization to CSV exports before AI training — locally, without server transmission

Pseudonymize direct identifiers in training datasets supporting Article 10 data governance

Process training data in your browser — reduce processor exposure under GDPR Article 28

Document de-identification steps as part of your Article 10 data sheet

Minimize and Mask Training Data →

EU AI Act and CSV Data: What Data Teams Must Do Before August 2, 2026

Quick Answer

Table of Contents

High-Risk AI Systems: Are You in Scope?

The August 2, 2026 Deadline: What It Covers

Article 10: Training Data Quality Requirements

Data Minimization: AI Act Recital 69

Practical Data Pipeline Obligations for High-Risk AI

AI Act Obligations Beyond Article 10: What Data Teams Need to Know

Additional Resources

FAQ

Which AI systems are high-risk under EU AI Act Annex III?

Does the EU AI Act apply to AI systems built outside the EU?

What is the grandfathering period for existing AI systems?

Does using GDPR-compliant training data satisfy EU AI Act Article 10?

Can I use personal data from existing customer CSV exports to train a new AI model?

How does the EU AI Act define "substantially influencing" a decision?

Prepare Your CSV Training Data for AI Act Compliance

Quick Answer

Table of Contents

High-Risk AI Systems: Are You in Scope?

The August 2, 2026 Deadline: What It Covers

Article 10: Training Data Quality Requirements

How the EU AI Act and GDPR Interact for CSV Data

Data Minimization: AI Act Recital 69

Practical Data Pipeline Obligations for High-Risk AI

AI Act Obligations Beyond Article 10: What Data Teams Need to Know

Additional Resources

FAQ

Which AI systems are high-risk under EU AI Act Annex III?

Does the EU AI Act apply to AI systems built outside the EU?

What is the grandfathering period for existing AI systems?

Does using GDPR-compliant training data satisfy EU AI Act Article 10?

Can I use personal data from existing customer CSV exports to train a new AI model?

How does the EU AI Act define "substantially influencing" a decision?

Prepare Your CSV Training Data for AI Act Compliance

Continue Reading

Do You Need a Database for a Large CSV File? (2026 Answer)

How to Open a Large CSV File — Even 10 GB, No Database (2026)

Excel File Too Large to Open? Fix Every Memory Error (2026)