From RPA to Agentic AI: The Operating Model Shift Nobody Talks About

RPA solved repeatable rule-based tasks. Agentic AI solves judgment-based tasks. The shift isn't technical — it's an operating model change that most organisations aren't ready for.

Saurabh Goenka

Founder & CEO, MindMap Digital

Every large enterprise we work with has an RPA programme. Most of them have plateaued. The pattern is consistent: an initial wave of bot deployments delivered real value on rule-based, high-volume processes, the centre of excellence built capability and governance, and then somewhere between bot 80 and bot 200, the pipeline stopped converting. The bots that were easy to build had been built. The remaining backlog was full of processes that required judgment, context, and exception handling — exactly the work RPA cannot do. The CoE leaders we talk to are increasingly facing uncomfortable conversations with their CFOs about why the second hundred bots are costing twice as much per unit of saving as the first hundred. Agentic AI is the architectural shift that unblocks the next wave, but the operating model implications are bigger than the technology change, and most of the failures we're seeing in early agentic deployments come from organisations that treated it as 'better RPA' rather than as a different category of system that needs different governance.

What RPA was actually good for

RPA's sweet spot was always narrow: deterministic, rule-based, repetitive work that crossed application boundaries where APIs didn't exist. Reconciliation between two systems with no integration. Form-filling from one screen into another. Report extraction and re-formatting. The technology — UiPath, Automation Anywhere, Blue Prism — was essentially a sophisticated screen-scraper plus a workflow engine. It worked because the work was specified rigidly enough that 'if pixel here, then click there' was reliable. Step outside that envelope, and the bots break the moment a UI changes, an exception appears, or a decision requires reading a free-text field.

Why most RPA programmes plateau at bot 200

The unautomated tail of work in a large enterprise looks different from the head. The head is high-volume, low-variance: process 10,000 invoices a day with the same fields in the same places. The tail is medium-volume, high-variance: process 200 exception cases a day where every case has a different reason, the supporting documents are unstructured, and the resolution requires policy interpretation. RPA can't reach the tail because the tail requires judgment. Most CoEs respond by building more elaborate decision trees inside their bots until the bots become unmaintainable. The right response is to switch to a different architecture.

What agentic AI actually changes

An agent is a system that uses an LLM to make decisions about what to do next, calls tools to take action, observes the results, and continues until a goal is met. The architectural shift from RPA is that the bot stops being a sequence of pre-defined steps and becomes a planner that picks steps from a library based on context. A claims-processing agent doesn't follow a flowchart — it reads the claim, decides whether it needs more documentation, calls the policy database, checks the customer's history, fetches medical records if applicable, evaluates against coverage criteria, and either approves, denies with reasoning, or escalates to a human with a structured summary that compresses what would have been three hours of casework into a one-paragraph briefing. Each step is selected at runtime, not authored in advance. This is fundamentally different from the RPA model where every branch had to be designed by a developer, and exception cases broke the bot until a developer rewrote it. The agentic model degrades gracefully — when it encounters something it hasn't seen, it can escalate, ask, or fall back to a safe default. RPA encountering the unknown simply fails.

The orchestrator-plus-specialist pattern

The reference architecture for enterprise agentic work is a two-tier graph. A top-level orchestrator agent owns the workflow goal and decomposes it into sub-tasks. Specialist agents — each scoped to a narrow domain like 'KYC document analysis', 'policy lookup', 'sanctions screening', 'customer communication' — execute the sub-tasks. The specialist agents have small, well-tested tool surfaces; the orchestrator has the conversational and planning capability. This separation is what makes the system testable and observable, which is what makes it deployable in a regulated environment. We build these graphs in LangGraph for most deployments, occasionally CrewAI for simpler use cases, and AutoGen when the workflow is genuinely conversational between agents.

The tool layer, MCP, and the accelerator library as agent library

Agents are only as capable as their tools. The Model Context Protocol that Anthropic shipped in late 2024 has changed how we build tool inventories — instead of writing custom tool wrappers for every integration, we publish MCP servers for our most-used enterprise systems (SAP, Oracle EBS, ServiceNow, Salesforce, JD Edwards, Workday) and any agent in any framework can discover and use them. This is a meaningful operational win: tool definitions are versioned, documented, and tested independently of the agents that use them, and an upgrade to the SAP connector benefits every agent that uses it. The 117 accelerators we've built over five years started life as code components — pre-built workflows, integrations, and business logic packaged for reuse — but as we've migrated them to an agentic architecture, they've taken on a new shape: each accelerator becomes a specialist agent with a well-scoped tool surface, a documented evaluation suite, and an MCP-addressable interface. The 'KYC Document Adverse Screening' accelerator is no longer a Python module; it's an agent that any orchestrator in any framework can invoke. Composition becomes near-zero-effort: building a new workflow is increasingly about selecting and wiring existing specialist agents rather than authoring new ones. The library is starting to behave like a standard library for a programming language, except the language is enterprise workflow and the standard library is industry-specific.

The operating model shift nobody talks about

The technology change is the easy half. The harder half is the operating model. RPA had a clear governance pattern: a CoE owned bot development, business teams owned process identification, IT owned the infrastructure. Bots were deterministic and could be tested against fixed scenarios; ROI was measured in FTE-hours saved against a baseline that the bot exactly replicated. Agentic AI breaks all three pillars of that model. Agents are non-deterministic — the same input can produce different action sequences, both correct, both valid, but different from each other. Testing requires statistical methods (evaluate against 500 cases and measure pass rates with confidence intervals, not pass/fail one case). ROI shifts from FTE-hours saved to decision quality, cycle-time reduction, and downstream business outcomes that take longer to attribute. The governance model has to change: who approves an agent's policy library when it's loaded in via RAG and changes weekly, who signs off when the agent escalates, who reviews the eval suite when the model is updated, who has the authority to take an agent offline when its behaviour drifts. Most CoEs are not staffed for any of this — the skill profile of an RPA developer is not the skill profile of an agent designer, and the skill profile of a CoE governance lead is not the skill profile of an AI ethics and quality assurance lead. The organisational rebuild required is substantial.

Change management and ROI: both harder than RPA's models suggested

RPA could be sold to the business as 'we're taking the boring work off your plate'. Agentic AI gets read as 'we're taking the judgment work off your plate', and the people doing that work resist — correctly, because their judgment is what made the process work at all. The deployments that succeed reposition the agent as a tool the human operates, not a replacement. A claims adjudicator becomes a reviewer of agent-prepared decisions, handling exceptions and edge cases; their throughput goes up 3-5x and their job satisfaction usually improves because the boring 80% goes away. The deployments that fail try to skip the human entirely on day one, hit a high-profile error, and lose organisational permission to continue. The ROI measurement is also harder. RPA's model was simple: bot replaces N hours of human work per month, multiply by loaded labour cost, subtract bot licensing and maintenance, declare savings. Agentic AI breaks that maths because the agent rarely replaces a human entirely; it typically processes 70-80% of cases end-to-end and routes the remaining 20-30% to a human with the analytical work pre-done. The right metrics are cycle-time reduction per case, decision-quality scores against a baseline, exception-routing accuracy, and human throughput on the residual cases. Customers who try to evaluate agentic deployments using the old RPA scorecard either understate the value or overstate it — we've started shipping ROI models specific to agentic with every deployment because the legacy CoE measurement frameworks don't fit.

Where to start, in 2026

Pick a workflow where RPA stalled — a process where you tried to automate but couldn't push past 60% straight-through processing. Run a six-week agentic pilot with a single orchestrator and 2-3 specialist agents, evaluated against a 200-case ground truth set built with the operational team that owns the work. Don't try to replace humans; route the bottom-confidence 30% of decisions to them and measure their throughput. If the system delivers 2x productivity with no degradation in decision quality, you have permission to scale. If it doesn't, you've learned something specific about the workflow rather than something vague about the technology. The trap to avoid is the 'transformation programme' framing — large, multi-quarter, multi-workflow agentic programmes that try to deliver everything at once almost always stall at the governance layer. Small, specific, measurable workflows that ship to production in eight weeks build organisational capability faster than any strategy deck.

About the author

Saurabh Goenka →

Founder & CEO, MindMap Digital

Saurabh has spent the last five years shipping sovereign AI for regulated enterprises. He's personally led engagements with tier-1 banks across the Gulf, East Africa and South Asia, with healthcare systems in the UK and India, and with central-government agencies on three continents. He speaks regularly at industry forums on the engineering reality of EU AI Act compliance and sovereign LLM deployment.

Credentials + recognition

✓NASSCOM Tech Excellence 2026 — Healthcare AI category winner
✓ET NOW 40 Under 40 (2026)
✓Outlook Dynamic Leaders (2025)
✓ICAI 40 Under 40 (2021) · Chartered Accountant
✓Forbes Business Council member (2021–present)
✓50+ enterprise AI deployments shipped

Areas of repeated lived expertise

Sovereign AI architectureEU AI Act + RBI + SAMA compliance engineeringBFSI AI transformationHealthcare AI at scalePublic-sector AI deployment

More Insights

Keep reading

The 2026 Sovereign AI Architecture Report

Data-driven analysis of every meaningful sovereign AI stack in production today. Compares 6 open-weights model families, 4 vector databases, 3 inference servers and 5 reference architectures on cost-per-million-tokens, regulator-readiness, integration substrate and operational complexity. Survey-based, with the deployment numbers from 50+ regulated-industry engagements behind every recommendation.

Saurabh Goenka

22 min read

State of Agentic AI in Regulated Industries 2026

A production-pattern survey of agentic AI in BFSI, healthcare, public sector and pharma. What patterns actually ship (ReAct + tool-use, planner-executor, multi-agent orchestration), what fails in audit (silent loops, hidden tool calls, unbounded reasoning), and the four engineering controls separating prototypes from production. Based on the agent runtimes we've shipped at 17 regulated customers in the past 18 months.

MindMap Engineering

20 min read

EU AI Act Readiness Benchmark — 50 Enterprises

Anonymised readiness benchmark across 50 enterprises with EU exposure — banks, insurers, hospitals, manufacturers, public-sector bodies — measured against the 11 Articles 9–15 evidence requirements. Median readiness is 38%; only 14% would survive a supervisory audit today. Where the gaps cluster, why they're tractable in 90 days, and the five interventions that close the most ground.

Saurabh Goenka

18 min read

View all insights →

Ready to apply these ideas?

Talk to our engineering team. No sales pitch — just a technical conversation.

Start a conversation →