ResearchJuly 2026·18 min read

EU AI Act Readiness Benchmark — 50 Enterprises

Anonymised readiness benchmark across 50 enterprises with EU exposure — banks, insurers, hospitals, manufacturers, public-sector bodies — measured against the 11 Articles 9–15 evidence requirements. Median readiness is 38%; only 14% would survive a supervisory audit today. Where the gaps cluster, why they're tractable in 90 days, and the five interventions that close the most ground.

Saurabh Goenka

Founder & CEO, MindMap Digital

In April 2026, we ran a structured readiness benchmark across 50 enterprises with EU exposure. The cohort: 14 banks (of which 5 are tier-1), 9 insurers, 11 hospitals or hospital groups, 9 manufacturers, and 7 public-sector bodies. The instrument: a 47-question structured assessment mapped to the 11 Articles 9–15 evidence requirements, scored on a five-point scale by a MindMap Digital consultant during a 90-minute structured interview with the customer's AI governance lead and validated against artefact review. The results are sobering. Median readiness across the 50 enterprises is 38% — well below the 70% threshold most supervisors are likely to treat as audit-survivable. Only 14% of the cohort (7 enterprises) would survive a supervisory audit today against Articles 9–15. The gap is real and the timeline is short. This report covers where the gaps cluster, why they're tractable in 90 days from a standing start, and the five interventions that close the most ground.

Method

The 47-question instrument is mapped to the 11 substantive Articles in the high-risk AI obligations (Articles 9 through 15, plus the cross-cutting Articles 17 — post-market monitoring, Article 25 — provider obligations on deployers, Article 26 — deployer obligations, Article 27 — fundamental-rights impact assessment). Each question is scored on a five-point scale (0 — no evidence, 1 — evidence exists but is informal, 2 — evidence exists and is documented but not current, 3 — current evidence with one gap, 4 — fully current and complete). A question score of 3 or above is treated as audit-survivable. The aggregate readiness score is the mean of all questions, normalised to a percentage. The audit-survivable threshold is the percentage of questions scoring 3 or above. The benchmark was conducted between 4 April and 16 May 2026. Interview leads averaged 11 years of regulated-industry AI experience. Customers participated on a confidential basis; aggregate results are reported here without identifying individual enterprises.

Aggregate results — the five-number summary

Median readiness: 38%. Mean readiness: 41%. Standard deviation: 14 percentage points. Bottom-quartile readiness: 27%. Top-quartile readiness: 53%. Percentage of cohort above 70% (audit-survivable): 14%. Percentage of cohort above 50%: 38%. Percentage of cohort below 30%: 22%. The shape of the distribution matters: there is a long left tail of customers who have not begun the substantive evidence work, a thick middle of customers with partial evidence, and a small right tail who have invested substantively. The right-tail customers (14% of the cohort) without exception began the evidence work no later than Q3 2025 — a clean 9-month head-start over the median. The left-tail customers without exception believed in late 2025 that the deadline would slip or that GPAI vendors would absorb the compliance burden. Neither has happened.

Article-by-Article gap profile

We report the percentage of the cohort scoring 3 or above on each substantive Article. Article 9 (risk management) — 31% audit-survivable. Risk-management systems exist but are typically generic enterprise risk frameworks not specifically applied to AI lifecycle. Article 10 (data and data governance) — 24% audit-survivable. The training-data lineage requirement is the single biggest gap; most customers cannot evidence the provenance of training data for the AI systems they deploy. Article 11 (technical documentation) — 22% audit-survivable. The Annex IV technical-documentation requirement is substantial; most customers have system documentation but not Annex IV documentation specifically. Article 12 (record-keeping) — 36% audit-survivable. Most customers log enough; few log the right things; very few log in a format that survives audit review. Article 13 (transparency to deployers) — 41% audit-survivable. This is the strongest Article for the cohort because the obligations are relatively narrow and many vendors have begun providing the required documentation. Article 14 (human oversight) — 28% audit-survivable. The translation from "human in the loop" to "effective human oversight" as defined in the Article is non-trivial; most customers have nominal oversight that wouldn't survive substantive review. Article 15 (accuracy, robustness, cybersecurity) — 33% audit-survivable. The accuracy benchmarking and robustness testing requirements are tractable for customers with mature ML/AI engineering; difficult for customers who treat AI as a procured product. Article 17 (post-market monitoring) — 19% audit-survivable. The lowest-scoring Article. Most customers have not begun post-market monitoring of AI systems in the structured way the Article requires.

Sector breakdown — where each industry sits

Banking (14 customers): median readiness 44%, audit-survivable percentage 21%. Banks score highest on Article 9 (risk management — extending existing risk frameworks) and lowest on Article 17 (post-market monitoring — AI-specific monitoring is not yet operationalised). Insurance (9 customers): median readiness 39%, audit-survivable 11%. Insurers lag banks by roughly 5 percentage points across all Articles. Hospitals (11 customers): median readiness 31%, audit-survivable 9%. Healthcare lags substantially on Article 10 (data governance — patient-data lineage is complex) and Article 11 (technical documentation — clinical AI procurement has not traditionally included Annex IV-style documentation). Manufacturing (9 customers): median readiness 36%, audit-survivable 11%. Manufacturers score reasonably on Article 14 (human oversight — existing safety-critical-systems engineering practices translate) but lag on Article 9 and Article 11. Public sector (7 customers): median readiness 41%, audit-survivable 14%. Public-sector bodies score well on Article 13 (transparency, given pre-existing FOI obligations) and Article 14 (oversight, given pre-existing democratic-accountability traditions) but lag on Article 11 (technical documentation) and Article 17.

Where the gaps cluster — five patterns

Cross-sector, five gap patterns are responsible for the majority of the readiness shortfall. Gap 1: training-data provenance. Most AI systems in production were procured as products; the customer has no documented evidence of training-data lineage and is unlikely to obtain it from the vendor within the timeline. The fix is sovereign-deployment of customer-trained or customer-fine-tuned models for new high-risk use cases, with full lineage captured. For existing deployments, the fix is a vendor-side evidence-request programme — many vendors will provide what they can, and the gap that remains becomes a documented residual risk. Gap 2: Annex IV technical documentation. The Annex IV format is non-negotiable; the work to produce it is mostly translation from existing system documentation. We've built a template that takes typical enterprise system documentation and restructures it into Annex IV compliance — saves substantial time vs writing from scratch. Gap 3: structured post-market monitoring. Most customers have not built the operational substrate for AI-specific post-market monitoring. The fix is a Langfuse-or-equivalent observability layer with explicit drift detection and a structured incident-review process. Gap 4: effective human oversight. Most customers have humans in the loop but not the structured oversight Article 14 requires. The fix is an explicit oversight protocol per AI system, with documented competence requirements for the human reviewer and a structured procedure for the reviewer's authority to override the AI output. Gap 5: governance integration. AI risk-management work sits in a separate function from operational risk, model risk, and IT risk in most customers. The fix is governance integration — bringing AI-risk into the enterprise-risk function with appropriate AI-specific competence.

Why the gaps are tractable in 90 days

The cohort data suggests that a focused 90-day programme can move a customer from the median (38%) to audit-survivable (70%+) across most Articles. The argument is empirical: among the 14% of the cohort above 70%, the customers who moved the furthest in the shortest time achieved their gains primarily through a focused 90-day programme rather than a multi-quarter transformation. The reason 90 days works is that most of the gap is documentation and operationalisation of existing practices, not net-new capability construction. Article 9 risk management — most customers have a risk function; the work is AI-specific extension. Article 10 data governance — most customers have data governance; the work is AI-lineage extension. Article 11 technical documentation — most customers have system documentation; the work is Annex IV translation. Article 12 record-keeping — most customers log; the work is the right-format-at-the-right-granularity discipline. Article 14 oversight — most customers have human review; the work is the protocol-and-authority explicitness. Article 15 accuracy and robustness — most customers have ML or AI engineering; the work is evaluation-harness discipline. The Articles that require substantial new capability construction (Article 17 post-market monitoring being the dominant one) are exactly the Articles that benefit most from the Langfuse-or-equivalent observability substrate, which is now a 2-week deployment, not a multi-quarter programme.

The five interventions that close the most ground

Across the cohort, five interventions consistently moved customers' readiness scores the furthest in the shortest time. Intervention 1: an inventory and Annex III classification of every AI system in production, with the high-risk subset elevated to Articles 9–15 evidence collection. Cost: 4–6 weeks of focused effort. Average readiness lift: 8–14 percentage points. Intervention 2: an Annex IV technical-documentation template-and-fill exercise. Cost: 2–3 weeks per AI system. Average readiness lift on Article 11: 35–50 percentage points (from low base). Intervention 3: deployment of Langfuse self-hosted (or equivalent) as the observability substrate for AI systems, with structured post-market monitoring built on top. Cost: 2 weeks platform deployment plus ongoing operational discipline. Average readiness lift on Articles 12 and 17: 25–40 percentage points each. Intervention 4: an oversight-protocol exercise per high-risk AI system, with documented competence requirements and override authority. Cost: 1 week per system. Average readiness lift on Article 14: 30–45 percentage points. Intervention 5: governance integration — bringing AI-risk into the enterprise-risk function with named AI-risk capability. Cost: organisational change, typically 6–8 weeks to design and 3–6 months to operationalise. Average readiness lift on Article 9: 15–25 percentage points.

What separates the right tail from the rest

The 14% of the cohort above 70% readiness shared four operational characteristics. First, they began substantive Article-9–15 evidence work no later than Q3 2025, giving them at least 9 months to operationalise before our April 2026 measurement. Second, they had a named AI governance lead with explicit accountability and authority — not a steering-committee model. Third, they were already sovereign-deployed for their high-risk AI systems (12 of the 14% above-threshold — sovereign deployment substantially eases the lineage and documentation requirements). Fourth, they had built the observability substrate (Langfuse or equivalent) into the platform layer, so post-market monitoring was automatic rather than manual. The cohort below 50% readiness without exception lacked at least two of these four characteristics. The implication for customers currently at median or below: the four characteristics above are achievable in a focused 90-day programme, and customers who haven't started have approximately 60 days left in our forward-modelling assumptions to start.

Implications for the 2 August 2026 deadline

Three implications follow from the cohort data. First: the median enterprise is not going to be ready, and supervisors will face a choice between an unenforceable deadline and a wave of enforcement actions that they may not have the operational capacity to manage. Realistically, we expect supervisory leniency for customers with a credible 12-month remediation plan in place at the deadline — this is consistent with the pattern of how MiFID II, GDPR, and DORA enforcement actually unfolded. Second: the supervisors that will move first will be the ones who have publicly stated they will (ICO, BaFin, ACPR, the Banca d'Italia). Customers with material EU exposure who do business with these regulators should prioritise readiness for those supervisors first. Third: the supervisors' tolerance for delay will not extend equally to all customers. Tier-1 banks, large hospital systems, and major manufacturers will be expected to be substantively further along than smaller customers; the relative position within sector matters as much as the absolute readiness score. For tier-1 customers currently at or below median, the next 60 days are decisive — the gap to audit-survivable is tractable but only with a focused programme starting immediately.

Methodology notes and limitations

Three limitations of the benchmark worth flagging. First: the cohort is non-random. The 50 enterprises were customers, prospects, or partners of MindMap Digital who agreed to participate in the assessment. The cohort likely over-represents customers who are already engaged on AI governance work and under-represents customers who are not. The true median across all EU-exposed enterprises is plausibly lower than the 38% we measured. Second: the scoring relies on consultant judgement during a structured interview, validated against artefact review where possible. We sampled artefact validation across 28% of question-responses; the inter-rater agreement between the consultant score and the validation score was 81%. Where artefact validation was not possible (commonly because the artefact did not exist), the question was scored based on interview response. Third: the benchmark is a point-in-time measurement as of April–May 2026. Customer readiness will move (in both directions — some customers will improve substantially before August; some will slip as competing priorities consume governance capacity). The 38% median is the best estimate as of the measurement date, not a forward-looking forecast.

About the author

Saurabh Goenka →

Founder & CEO, MindMap Digital

Saurabh has spent the last five years shipping sovereign AI for regulated enterprises. He's personally led engagements with tier-1 banks across the Gulf, East Africa and South Asia, with healthcare systems in the UK and India, and with central-government agencies on three continents. He speaks regularly at industry forums on the engineering reality of EU AI Act compliance and sovereign LLM deployment.

Credentials + recognition

✓NASSCOM Tech Excellence 2026 — Healthcare AI category winner
✓ET NOW 40 Under 40 (2026)
✓Outlook Dynamic Leaders (2025)
✓ICAI 40 Under 40 (2021) · Chartered Accountant
✓Forbes Business Council member (2021–present)
✓50+ enterprise AI deployments shipped

Areas of repeated lived expertise

Sovereign AI architectureEU AI Act + RBI + SAMA compliance engineeringBFSI AI transformationHealthcare AI at scalePublic-sector AI deployment

More Insights

Keep reading

The 2026 Sovereign AI Architecture Report

Data-driven analysis of every meaningful sovereign AI stack in production today. Compares 6 open-weights model families, 4 vector databases, 3 inference servers and 5 reference architectures on cost-per-million-tokens, regulator-readiness, integration substrate and operational complexity. Survey-based, with the deployment numbers from 50+ regulated-industry engagements behind every recommendation.

Saurabh Goenka

22 min read

State of Agentic AI in Regulated Industries 2026

A production-pattern survey of agentic AI in BFSI, healthcare, public sector and pharma. What patterns actually ship (ReAct + tool-use, planner-executor, multi-agent orchestration), what fails in audit (silent loops, hidden tool calls, unbounded reasoning), and the four engineering controls separating prototypes from production. Based on the agent runtimes we've shipped at 17 regulated customers in the past 18 months.

MindMap Engineering

20 min read

What CRO Conversations on AI Look Like in 2026

Synthesis of 50+ Chief Risk Officer conversations across BFSI, healthcare and public sector over the past nine months. What they're actually asking about (vendor concentration, model lifecycle, audit substrate), what they've stopped asking about (jailbreaks at the chatbot layer), and the four risk-framing shifts that have happened in CRO offices since Q4 2025. Forward-looking year-in-review angle.

Saurabh Goenka

16 min read

View all insights →

Ready to apply these ideas?

Talk to our engineering team. No sales pitch — just a technical conversation.

Start a conversation →