Enterprise Document Intelligence at a MENA Corporate Bank — One Platform for KYC, Credit, Trade and Treasury
DocuMage + Data Extractor + Redacto delivering a shared document-intelligence platform replacing five point-OCR products and a back-office team of 200.
The challenge
The bank — a major MENA corporate and investment bank with regional operations across nine markets — had accumulated five separate document-processing products over a decade, each commissioned to solve a specific business-line problem. KYC documents were processed by a global IDP vendor's product; credit-application documents by a regional vendor's product; trade-finance documents by a specialist trade-tech platform's OCR; treasury-confirmation documents by an in-house OCR build; and back-office documents (signed instructions, customer correspondence, branch-network paperwork) by yet another vendor.
Each product was separately licensed, separately operated, separately governed and separately integrated. The bank was spending materially on platform licences, on the integration glue between the platforms, and on the back-office team — approximately 200 staff — that handled the documents the legacy OCR products could not process cleanly. The bank's CIO had estimated that the total cost of the document-processing estate was several times what an equivalent unified platform should cost.
Migration to a single platform had been considered several times and rejected on risk grounds: the existing point-products were each deeply integrated with their respective business-line workflows, and a wholesale replacement carried unacceptable execution risk. The CIO's brief was to find a migration pattern that absorbed the legacy products' workload progressively without the big-bang risk that had killed previous attempts.
The approach
MindMap deployed an enterprise document-intelligence platform composed of DocuMage as the core IDP engine, Data Extractor (De) as the schema-aware structured-extraction layer, Redacto (Re) as the PII-redaction layer (used both for downstream sharing and for the training of bank-specific extractors), and Multi-Agent Orchestrator (Mo) as the document-routing coordinator. The platform was designed to be the unified target estate for all the bank's document processing, with the five legacy products migrated onto it on a phased timeline.
Phase one was the platform build. The unified platform went live as the new target for any new document-processing workload, with the existing legacy products continuing to operate in their respective business lines. New use cases — and there were always new use cases — were directed to the unified platform rather than to the legacy estate.
Phase two was the migration of the easier legacy products. The treasury-confirmation in-house OCR (the smallest of the five, both in licence cost and in integration surface area) was retired first, with its workload absorbed onto the unified platform over a twelve-week migration. The back-office document product followed, on a similar pattern.
Phase three was the migration of the harder products. KYC, credit and trade-finance each had deeper business-line integration and required more careful migration. Each was migrated on a workload-by-workload basis (e.g. for KYC: passport processing first, then driving-licence processing, then proof-of-address processing, then the long tail) with the legacy product remaining in production for the not-yet-migrated workloads. Cutover happened only when the workload-specific extraction accuracy on the unified platform exceeded the legacy product's accuracy.
The pre-built building blocks
Rather than commission a ground-up build, the engagement leaned on MindMap's pre-built accelerator library — production-tested components that compress what would otherwise be a six-to-nine-month build into weeks.
DocuMage
Core IDP engine with per-document-type fine-tuned extractors
Data Extractor
Schema-aware structured extraction and per-business-system mapping
Redacto
PII redaction for downstream sharing and training-corpus preparation
Multi-Agent Orchestrator
Document routing and cross-business-line workflow coordination
The architecture
The platform runs on the bank's private cloud tenant inside Azure UAE, with country-specific extension nodes in two additional MENA markets where the bank's subsidiaries had local data-residency requirements. The platform is multi-tenant by business line, with hard data-access boundaries between KYC, credit, trade and treasury workloads enforced at the storage and compute layer.
DocuMage runs a layered extraction model. For each document type the bank processes (currently approximately 180 distinct types across the four business lines), a fine-tuned extraction model handles the primary extraction; the fine-tuning is per document type, trained on the bank's historical labelled data for that document type. For novel document types (the long tail) a generalist LLM extractor handles the work with structured-output constraints.
Data Extractor sits on top of DocuMage and converts the document extractions into the business-system-specific schemas. The same document (e.g. a passport) is extracted once by DocuMage and then projected into different business-system schemas (the KYC system's customer-profile schema, the credit system's applicant schema, the trade-finance system's beneficiary schema) by Data Extractor's per-system mappers.
Redacto is used in two roles. First, as a downstream service for any business process that needs to share documents externally (legal disclosure, regulatory inspection, audit response) with PII redacted. Second, as a training-time service: when the platform trains a new bank-specific extractor on historical documents, Redacto removes customer-identifying data from the training corpus so the model training itself does not entangle PII with model parameters.
Integration with each business-line system uses the platform's connector library — pre-built connectors for the bank's KYC system, credit-origination platform, trade-finance system, treasury system and the bank's central document-management repository. The connectors handle the bidirectional flow of documents and extracted data, with appropriate authentication, audit-logging and rate-limiting.
The numbers behind the story
Three years post-platform-go-live, all five legacy document-processing products have been retired. The unified platform processes approximately 11 million documents per year across the four business lines, replacing the previous fragmented estate entirely.
Field-level extraction accuracy across the document portfolio is 97.4%, against a weighted average of approximately 84% across the previous legacy products. The improvement is most pronounced on the document types where the legacy products had been weakest — multi-language correspondence, Arabic-language financial statements, multi-page trade-finance presentations.
Total operating cost on the document-processing estate has dropped by approximately $6.2m annually, with the licence costs of the five legacy products and the back-office team's headcount reduction together contributing the bulk of the saving. The back-office team has been progressively redeployed into exception-handling, document-governance and data-quality roles rather than reduced through headcount cuts.
An unexpected outcome: the unified platform has become the foundation for new document-driven AI use cases that the fragmented legacy estate did not support. The bank has built three additional accelerators on top of the unified platform — automated covenant tracking on trade-finance documents, automated regulatory-update parsing on correspondent-bank circulars, and automated SLA tracking on legal-correspondence flows — each delivered in weeks rather than months because the document-intelligence foundation was already in place.
“We had spent a decade accumulating five point-products and the integration glue between them. MindMap consolidated the entire document-processing estate onto a single platform over three years without a single big-bang cutover. The platform now handles eleven million documents a year at accuracy levels we never achieved on the legacy products, and it has become the foundation for a generation of new document-driven AI use cases.”— Chief Information Officer· MENA Corporate Bank
Why MindMap was chosen
The bank had been quoted multi-year wholesale-replacement programmes by two global IDP vendors and one major consulting firm. All three approaches involved the big-bang cutover risk profile that had killed the bank's previous attempts.
MindMap's accelerator-composition approach — building a unified platform that absorbed the legacy workload progressively rather than replacing the legacy products all at once — was the structural differentiator. We could demonstrate the pattern at a comparable regional bank where four legacy products had been retired over an eighteen-month migration.
The willingness to deploy entirely inside the bank's Azure UAE tenant with country-specific data-residency extensions was the regulatory differentiator. Our embedded data-engineering expertise on the multi-business-line schema-mapping problem — which is the work that defeats most document-intelligence consolidations — was the third factor.
Related deployments
Sovereign WhatsApp Banking
ChatNext-powered WhatsApp bot deployed inside the bank's air-gapped data centre, handling balance, transfers, statements and loan applications in English and Swahili.
Cheque OCR at 99.1% Accuracy
DocuMage replaced a legacy template OCR for cheque clearing, processing 10,000 cheques per day at 99.1% field accuracy with 94% straight-through processing.
UK Challenger Bank KYC
OnboardX rebuilt the KYC pipeline with liveness, sanctions and PEP enrichment in a single STP flow, collapsing onboarding from 5 days to 4 hours.
Want an outcome like this?
Start with a 2-week AI Readiness Sprint. We deliver a prioritised use-case backlog and business case grounded in what's actually buildable with our accelerator library.