Medical Records Processing at a Cross-Border Hospital Group — 4× Throughput, 99.2% Coding Accuracy
Medical Records Parser ingesting 14,000 patient documents per day across nine hospitals, lifting clinical coding accuracy from 87% to 99.2%.
The challenge
The hospital group — a private, multi-specialty operator with nine hospitals spanning two European markets and one Gulf market, with combined annual patient volumes north of 1.3 million — was operating on a fragmented medical-records estate. Each hospital had its own EHR (a mix of two vendor platforms and one in-house build), each used slightly different document templates, and most clinical inputs still arrived on paper or as PDF scans — referrals from external physicians, lab results from third-party diagnostic labs, imaging reports, discharge summaries from other facilities, and member-supplied historical records.
The group's central revenue-cycle team in their EU headquarters was responsible for translating this documentation into the structured ICD-10 and CPT codes that drove billing and outcome reporting. Average clinical-coding accuracy was 87% — meaning thirteen of every hundred encounters were either under-coded (revenue leakage), over-coded (compliance risk), or mis-coded (both). The coding backlog was averaging four working days and the revenue-cycle team was running 30% over headcount budget.
The group's COO had a clear brief: get coding accuracy above 98%, eliminate the backlog, and do it without hiring more coders. The group had already evaluated three coding-automation vendors over the previous two years and concluded that none of them could handle the multi-EHR, multi-language, multi-format reality of their actual document estate.
The approach
MindMap's approach was structured around three accelerators: Medical Records Parser (Mp) for document ingestion and clinical field extraction, Coding Assistant (Cd) for the actual ICD-10 and CPT coding work, and Revenue Cycle Optimizer (Rc) for denial reduction and AR-days improvement.
We started with a six-week document discovery sprint. Our team sat with the revenue-cycle leads in each of the nine hospitals and built a catalogue of every distinct document type the coding team had to handle. The final catalogue ran to 211 distinct document types — from standard discharge summaries to handwritten referral notes, from pathology lab reports in three languages to insurance pre-authorisation letters with hospital-specific letterheads.
Medical Records Parser was deployed to ingest each document type and extract a standardised set of clinical fields — admission diagnosis, principal diagnosis, secondary diagnoses, procedures performed, medications administered, comorbidities, complications, length of stay and discharge disposition. The parser handles Arabic, English and German in the same pipeline, with a unified clinical ontology layer that maps language-specific clinical phrasing to SNOMED CT concepts.
Coding Assistant takes the structured clinical record and generates the recommended ICD-10 and CPT code set, with confidence scores and the specific documentation excerpts that justify each code. High-confidence cases (above 96% on every code) are auto-submitted to the billing system. Mid-confidence cases (between 80% and 96%) are reviewed by a coder in a one-click confirmation UI. Low-confidence cases are routed for full coder workup with the model's draft pre-populated.
We also deployed Revenue Cycle Optimizer to handle the downstream side of the process: identifying claims at risk of denial before submission, generating supporting documentation packages, and managing the appeals workflow for denied claims. Roughly 38% of the group's prior denial volume could be attributed to coding gaps that Coding Assistant now catches up-front.
The pre-built building blocks
Rather than commission a ground-up build, the engagement leaned on MindMap's pre-built accelerator library — production-tested components that compress what would otherwise be a six-to-nine-month build into weeks.
Medical Records Parser
Multi-language clinical document ingestion and field extraction
Coding Assistant
ICD-10 / CPT coding with confidence-routed review
Revenue Cycle Optimizer
Denial prediction, appeal automation, AR-days reduction
DocuMage
OCR + ICR for the long-tail document types
The architecture
The platform is deployed as a hybrid: ingestion and OCR processing run inside each hospital's local environment to satisfy each country's specific data-protection requirements, while the central reasoning and ML model layer runs in the group's regional cloud tenant in Frankfurt. PHI is pseudonymised at the local site before any cross-border transit; identifiers are stripped, replaced with cryptographic tokens, and only re-identified at the local site when results are written back. This pattern was approved by all three of the group's data-protection officers.
The OCR layer combines Tesseract 5 (open source) for the bulk volume with Azure Document Intelligence for tables and forms and a custom-trained handwriting model for the referral notes and physician annotations. The handwriting model was fine-tuned on roughly 65,000 hand-labelled examples drawn from the group's own historical records during the discovery sprint.
The clinical NLP layer uses a fine-tuned medical LLM — we used Med-PaLM-style fine-tuning on Llama 3 70B, trained on a corpus of de-identified clinical notes from the group's own archive plus the publicly available MIMIC-III dataset. The model produces SNOMED CT concept extractions, ICD-10 candidate codes with reasoning, and CPT candidate codes with documentation citations. A second, smaller model performs validation against the coding guidelines for the relevant country (the group operates under different coding rule sets across its three markets).
Integration with each EHR happens through an HL7 FHIR-based adapter layer that we built per hospital, mapping each EHR's specific data model into the canonical patient-record schema the parser expects. The same adapter layer pushes the final coded record back into the EHR and the billing system. Where an EHR did not support FHIR cleanly (the in-house EHR at two of the hospitals), we built a direct database-level adapter using the EHR vendor's supported ODBC interface.
The numbers behind the story
Clinical coding accuracy across the group has risen from 87% to 99.2% on a rolling thirty-day measurement. Documents per day processed have risen from approximately 3,500 to 14,200, an effective 4× throughput improvement. The coding backlog has been eliminated — the team is now processing same-day on more than 96% of encounters.
Revenue cycle metrics have improved materially. AR days (accounts receivable, days sales outstanding) have fallen by an average of 9.4 days across the group. Denial rate on initial claim submission has dropped from 14% to 4.6%. The group's overall revenue-cycle yield — the percentage of charged revenue that is ultimately collected — has improved by 23%.
The coding team itself has not been reduced. The COO redeployed roughly 40% of coder capacity into clinical-documentation improvement (CDI) work — partnering with physicians to improve the quality of the underlying clinical documentation, which has further improved coding accuracy in a virtuous loop. CDI coverage has gone from 22% of inpatient encounters to 81%.
The group's chief medical informatics officer has used the Medical Records Parser as the foundation for a new clinical-research data lake — the structured clinical fields the parser extracts now feed a research-grade longitudinal patient record that the group's research arm is using to support fifteen active clinical studies.
“We had two failed vendor evaluations behind us when we engaged MindMap. The difference, from week one, was that they actually parsed our real documents — including the Arabic pathology reports and the handwritten referrals from external physicians that the previous vendors had simply refused to take in scope. Six months in, our coders are doing clinical documentation improvement instead of data entry, and our revenue-cycle yield is up materially.”— Chief Operating Officer· European-Gulf Hospital Group
Why MindMap was chosen
The group had previously evaluated three coding-automation vendors. Two were US-centric and could not handle the multi-language reality of the document estate. The third proposed a six-month pilot that did not include any of the hospital's actual document types in scope.
MindMap won the bid because we proposed — and delivered — a four-week proof of concept on the group's actual document estate, including the Arabic-language pathology reports and the handwritten referral notes the other vendors had refused to engage with. We came to the POC with Medical Records Parser already running in two comparable Gulf-region hospital deployments, and with a clinical informaticist on the delivery team who could converse with the group's CMO about coding guideline differences across the three markets.
The hybrid deployment pattern — local OCR with central reasoning — was uniquely suited to the group's data-protection constraints, and was a pattern we had already deployed in a multi-country financial services context. The other vendors were proposing single-region cloud deployments that would have required exception approvals from each of the group's data-protection officers.
Related deployments
Prior Auth Acceleration
Prior Auth Accelerator + DocGenie automated 70% of payer prior auth submissions end-to-end, reducing turnaround from 3 days to 4 hours.
Clinical-Trial Document Workflow
DocuMage + Clinical Trial Matcher accelerated study-start document cycle 5x — letting trial sites enrol patients weeks sooner than the prior baseline.
Indian Hospital Medical Records
Medical Records Parser ingested 8M legacy paper-and-PDF records into a structured, searchable, 14-language longitudinal patient record.
Want an outcome like this?
Start with a 2-week AI Readiness Sprint. We deliver a prioritised use-case backlog and business case grounded in what's actually buildable with our accelerator library.