Real-Time Card Fraud Detection at a GCC Tier-1 Bank — 43% Fraud Loss Reduction at Half the False-Positive Rate
Fraud Guard replaces a legacy rules engine with a hybrid graph-and-LLM fraud model running at sub-50ms p99 on the authorisation path.
The challenge
The bank — a Tier-1 commercial bank in the GCC with approximately eight million issued cards across debit, credit and prepaid portfolios — was running a card-fraud detection engine that had not been materially upgraded since 2017. The engine was rule-based, with roughly 1,400 hand-maintained rules covering merchant categories, geographic patterns, velocity thresholds and behavioural heuristics. Fraud losses were trending upward, primarily driven by card-not-present fraud on the bank's growing e-commerce volume and a steady increase in account-takeover patterns following credential leaks at third-party merchants.
More damaging than the absolute fraud loss was the false-positive rate. The legacy engine was declining roughly 1.8% of all authorisation attempts as suspected fraud, of which more than 95% turned out to be legitimate. Customer-experience metrics for declined transactions were the worst in the bank's portfolio, and the relationship-banking team estimated that a meaningful fraction of the bank's mass-affluent attrition was traceable to repeated false declines.
The bank had three constraints that ruled out the obvious off-the-shelf solutions. First, regulatory: SAMA equivalent rules required all card-authorisation processing to happen inside the country's borders. Second, latency: the existing authorisation path budget for fraud scoring was 80ms p99, and a new engine could not exceed that without breaking merchant SLAs. Third, explainability: every fraud decline had to be explainable to the customer-service team within 30 seconds for a real-time dispute conversation.
The approach
MindMap deployed Fraud Guard (Fg) as the new primary fraud scoring engine, with Anomaly Detector (Ad) as the long-tail behavioural layer and Sentiment Analyzer (Sa) reused for the customer-service dispute path. The brief was structured as a parallel-run replacement — Fraud Guard would shadow the legacy engine for ten weeks before any traffic switched.
The first phase was data archaeology. Our team ingested three years of historical authorisation data, three years of confirmed fraud cases (the labels), and the full configuration history of the legacy rules engine. The objective was twofold: train a fraud model on the bank's actual fraud patterns rather than a generic benchmark, and capture the institutional knowledge embedded in the 1,400 hand-maintained rules so that the new engine could be measurably better, not just different.
The model itself is a hybrid. A gradient-boosted tree ensemble (XGBoost) handles the bulk of the scoring on classical features — amount, merchant category, geographic distance, time-of-day, velocity windows, customer behavioural baseline. A graph neural network layer adds device-to-card-to-merchant relationship signals — the patterns that distinguish a legitimate first-time international purchase from a card-testing attack. An LLM-based reasoning layer (a fine-tuned Qwen 2.5 7B model) runs only for the borderline cases where the tree ensemble's confidence is between 30% and 70%, and provides the structured explanation that the customer-service team uses.
The model is trained nightly on the rolling window of authorisation outcomes — the previous day's confirmed-fraud cases and confirmed-non-fraud cases — with a champion-challenger framework that promotes a new model version only when its lift over the production model is statistically significant on a held-out validation set.
The pre-built building blocks
Rather than commission a ground-up build, the engagement leaned on MindMap's pre-built accelerator library — production-tested components that compress what would otherwise be a six-to-nine-month build into weeks.
Fraud Guard
Primary card-fraud scoring engine with XGBoost + graph features
Anomaly Detector
Long-tail behavioural anomaly layer
Sovereign LLM Platform
On-prem Qwen serving for borderline-case explanations
Compliance Monitor
Regulator reporting and audit trail
The architecture
The Fraud Guard stack runs entirely on the bank's primary on-premises infrastructure in its main data centre, with a hot-standby cluster at the bank's secondary site. No fraud scoring traffic leaves the country's borders.
The inline scoring path is the latency-critical component. The XGBoost model and the graph features are served from an in-memory feature store (Redis Cluster, sized at 480GB across the cluster) with sub-millisecond feature lookups. The XGBoost inference itself runs at a p99 of 6ms on the bank's CPU fleet — no GPU is used on the inline path because the latency overhead of GPU dispatch exceeded its benefit at this model size.
The LLM reasoning layer runs off the inline path as an asynchronous enrichment. When the inline scorer returns a borderline confidence, the transaction is approved or declined based on the deterministic threshold, and the LLM is invoked separately to generate the explanation that is attached to the authorisation record. By the time the customer calls customer service to dispute a decline (typically several minutes later), the explanation is already in the case-management system. The LLM is served on a small cluster of L40S GPUs using vLLM, with the Qwen 2.5 7B model quantised to 8-bit.
The graph component is the architectural novelty. The bank's eight million cards, the merchant base, the device fingerprints and the historical authorisation graph are held in a Neo4j cluster with a custom-built incremental updater. Graph features (neighbourhood risk score, shortest path to a known fraud cluster, community-detection cluster ID) are precomputed on a 5-minute cadence and pushed to the feature store.
The full authorisation trace, the model inputs, the model outputs and the LLM explanation are persisted to an audit store with the regulator's required retention period.
The numbers behind the story
On the ten-week shadow period, Fraud Guard caught 24% more confirmed fraud than the legacy rules engine while declining 51% fewer legitimate transactions. The bank's CRO approved live cutover at the end of the shadow period.
Six months post-cutover, gross fraud loss across the card portfolio is down 43% on a like-for-like basis (controlled for portfolio growth and seasonal pattern). The false-positive rate has dropped from 1.8% of authorisations to 0.86%, with the corresponding improvement in customer-experience metrics on the declined-transaction journey.
The graph component has caught patterns the legacy engine systematically missed — coordinated card-testing attacks where each individual transaction looks innocuous but the device-card-merchant graph reveals a clear attack cluster. The bank's fraud-operations team has used the graph visualisations to refer four organised-fraud cases to law enforcement.
The explainability layer has paid for itself on customer service. The average call-handle time for a card-decline dispute call has dropped from 8 minutes to 3 minutes, because the customer-service agent has a structured explanation of the decline reason at the start of the call rather than having to dig through transaction history to understand it.
An unexpected outcome: the rules engine was not retired. The bank's fraud team kept approximately 80 of the original 1,400 rules — the ones encoding specific regulatory requirements or contractual obligations that needed to be deterministic — and Fraud Guard now executes those rules first and then applies the ML scoring only when the rules pass.
“We had been told for two years that the only credible path to materially better card-fraud detection was a hyperscaler-hosted ML platform. MindMap proved that wrong: on-premises, sub-50ms, 43% lower fraud loss and half the false positives. Our fraud team owns the model now, and they would not trade it back.”— Chief Risk Officer· GCC Tier-1 Bank
Why MindMap was chosen
The bank evaluated four vendors: two global fraud-platform providers, one regional fraud-analytics specialist, and MindMap. The two global vendors required at least part of the model serving to run in their own cloud regions, which was a regulatory non-starter. The regional specialist had the data-residency story but no LLM reasoning capability and no graph engine.
MindMap's pre-built Fraud Guard accelerator, combined with the willingness to deploy fully on-premises and the embedded data-science team that could train on the bank's actual data, was the decisive combination. The parallel-shadow-run model gave the bank's CRO the evidence base they needed to approve cutover.
Our team included a former fraud-operations lead from a comparable Gulf bank, who could speak directly to the bank's fraud team about operational realities the data scientists would not have understood. This was a recurring theme in the bank's post-engagement debrief: MindMap was not selling a model, it was building a system the fraud team would actually run.
Related deployments
Sovereign WhatsApp Banking
ChatNext-powered WhatsApp bot deployed inside the bank's air-gapped data centre, handling balance, transfers, statements and loan applications in English and Swahili.
Cheque OCR at 99.1% Accuracy
DocuMage replaced a legacy template OCR for cheque clearing, processing 10,000 cheques per day at 99.1% field accuracy with 94% straight-through processing.
UK Challenger Bank KYC
OnboardX rebuilt the KYC pipeline with liveness, sanctions and PEP enrichment in a single STP flow, collapsing onboarding from 5 days to 4 hours.
Want an outcome like this?
Start with a 2-week AI Readiness Sprint. We deliver a prioritised use-case backlog and business case grounded in what's actually buildable with our accelerator library.