Network Anomaly Detection at a Gulf Telecom — 4 Hours Earlier Detection of Customer-Impacting Incidents
Anomaly Detector + AI Ops Command Center + Service Monitor giving the NOC predictive visibility into network degradation before customer reports start arriving.
The challenge
The operator — a Gulf-region mobile operator with a converged 5G/4G/fibre/enterprise-services portfolio — was operating a network-operations function whose reactive posture was a structural source of customer-experience pain. The operator's NOC ran on the standard pattern: network-element-level monitoring (each network element emitting telemetry, with thresholds set by the network-engineering team triggering alerts), human-led correlation across alerts to identify root cause, and human-led customer-impact assessment to prioritise the response. By the time the NOC understood that a specific customer segment was being impacted, the impact had typically been live for two to four hours and the customer-reported-outage volume was already climbing.
The operator's customer-experience research had identified network-incident response as one of the top sources of customer dissatisfaction. The NOC's mean-time-to-resolution for customer-impacting incidents was a substantial multiple of the network-engineering team's target, with the bulk of the resolution time spent on the detection-and-diagnosis phase rather than on the actual remediation. The operator's chief technology officer had set a target of moving the NOC from reactive to predictive — detecting customer-impacting incidents before the customer-reported-outage volume started climbing, and resolving them before the customer-experience impact accumulated.
The constraints were significant. The network-telemetry volume was substantial — millions of telemetry events per minute across the operator's network footprint. The local-jurisdiction regulatory framework applied. And the network-operations team had a healthy scepticism of AI-driven NOC platforms, several of which had been over-promised and under-delivered by global vendors over the past decade.
The approach
MindMap deployed an AI-Ops platform composed of AI Ops Command Center (Ac) as the unified-NOC layer, Anomaly Detector (Ad) as the network-anomaly-detection engine, Service Monitor (Sm) for the customer-experience-monitoring layer, Incident Triage (It) as the alert-clustering-and-prioritisation engine, and Knowledge Base Builder (Kb) as the runbook-automation layer.
Phase one was the customer-impact-aware monitoring build. The traditional NOC approach is element-centric — what is each network element doing. The new approach is customer-impact-centric — what is the customer experience across each customer segment, region and service tier. The Service Monitor layer continuously tracks customer-experience metrics (call-drop rates, data-throughput-by-cell, service-completion rates, mobile-money-transaction-success rates) and flags degradation patterns even where the underlying network elements show within-threshold telemetry.
Phase two was the predictive-anomaly-detection build. The Anomaly Detector engine analyses the network telemetry streams for patterns that historically precede customer-impacting incidents. Many such patterns are subtle multi-element correlations that human alert-correlation could not realistically identify in time — a small uptick in error rate on one element combined with a small latency increase on another element combined with a slight shift in traffic pattern on a third element. The model is trained on the operator's historical incident archive, with the labelled-incident outcomes as the supervision signal.
Phase three was the runbook-automation build. For the incident types where the operator's engineering team had established response runbooks, Knowledge Base Builder codified the runbooks into automated-response actions that the platform can execute either automatically (for low-risk routine remediations) or with engineer approval (for higher-impact actions). The automated remediation closes the loop between detection and resolution for the recurring incident patterns.
Phase four was the NOC-workflow integration. The platform feeds into the operator's existing NOC tooling rather than replacing it — alerts surface in the NOC's standard alert-management interface, runbook actions execute through the operator's existing network-management framework, and the platform's predictive-incident-prioritisation feeds the NOC's existing prioritisation workflow.
The pre-built building blocks
Rather than commission a ground-up build, the engagement leaned on MindMap's pre-built accelerator library — production-tested components that compress what would otherwise be a six-to-nine-month build into weeks.
AI Ops Command Center
Unified-NOC layer with predictive prioritisation
Anomaly Detector
Multi-scale network-anomaly detection engine
Service Monitor
Customer-experience-monitoring layer with segment-level scores
Incident Triage
Alert clustering and root-cause-inference
Knowledge Base Builder
Runbook codification and automated-response
The architecture
The platform runs entirely on the operator's private cloud inside its primary data centre in the home market, with active-active failover to the secondary site. The network-telemetry data — sensitive operational data — stays inside the operator's perimeter.
The telemetry ingestion runs on Kafka, consuming streaming telemetry from approximately 80,000 active network elements across the operator's footprint. Telemetry volume is approximately 2.4 million events per minute at peak; the platform's ingestion architecture has been sized for 4x peak headroom.
Anomaly Detector's model is a multi-scale ensemble: per-element univariate anomaly detection (the standard pattern, but with adaptive thresholds rather than fixed thresholds), cross-element multivariate anomaly detection (the cross-element correlation patterns that precede incidents), and graph-network anomaly detection (the propagation patterns where an issue on one element drives downstream effects). The model is trained nightly on the rolling 60-day telemetry-and-incident history.
Service Monitor's customer-experience-monitoring layer aggregates customer-impact metrics from the operator's existing customer-experience telemetry (call-detail-record analysis, data-session-performance telemetry, mobile-money-transaction telemetry) and produces continuous customer-experience scores at segment and region granularity.
AI Ops Command Center is the unified-NOC layer where the alerts, the predictions, the customer-impact assessments, the recommended actions and the runbook-automation status are all presented to the NOC team. The interface is designed around the NOC team's existing workflow rather than as a replacement.
Integration with the operator's existing network-management tooling (the network OSS, the alert-management system, the change-management system) is via each tool's standard inbound API.
The numbers behind the story
Customer-impacting incident detection is now approximately 4 hours earlier on the median, with the predictive-anomaly-detection layer flagging the incident-precursor patterns before the customer-impact reaches the threshold that would have triggered the previous alerting model.
Mean-time-to-resolution has dropped approximately 56% on the affected incident classes. The improvement is roughly evenly split between earlier detection (the incident starts being worked-on sooner) and runbook automation (the routine response actions execute automatically rather than waiting for engineer time).
Customer-reported outage volume has dropped approximately 78%. The customer-experience improvement is the headline outcome — customers are reporting incidents at a fraction of the previous rate because the platform is resolving the incidents before the customer experience accumulates to the customer-report threshold.
The NOC team has not been reduced. The reclaimed capacity has been redirected to proactive network-improvement work — the engineering analysis of recurring incident patterns to drive structural fixes, the customer-experience-optimisation work that the reactive NOC posture had crowded out, and the engineering-readiness work for the operator's 5G expansion programme.
An unexpected outcome: the Anomaly Detector's pattern-detection has surfaced systemic issues in the operator's network estate that were not previously visible. Several vendor-side firmware-and-configuration issues that had been generating recurring low-level incidents (each one too small to investigate individually) have been identified through the pattern-level analysis and addressed with the relevant network vendors.
“Our NOC was reactive. Customers reported outages, the NOC investigated, the engineering team remediated, and the customer-experience impact accumulated through all of it. MindMap's platform moved us to predictive. We resolve incidents before the customer-reported-outage volume climbs, our MTTR is down fifty-six per cent, and our engineering team is doing proactive work for the first time in years.”— Chief Technology Officer· Gulf Telecom
Why MindMap was chosen
The operator had evaluated two global AI-Ops vendors and one specialist telecom-NOC vendor. The global AI-Ops vendors were strong on the general-purpose anomaly detection but had limited telecom-network-specific depth; the specialist telecom-NOC vendor had the domain depth but limited LLM-driven runbook automation capability.
MindMap's accelerator-composition approach — bringing AI Ops Command Center, Anomaly Detector, Service Monitor, Incident Triage and Knowledge Base Builder together with the customer-impact-centric design and the willingness to deploy entirely inside the operator's perimeter — was the structural differentiator. The customer-impact-centric design was the unique element; most NOC vendors are element-centric in their design.
Our embedded telecom-NOC expertise on the delivery team (three former NOC operations heads from regional telecoms and a former network-engineering director) was the third factor. The operator's CTO felt that the team understood the operational reality of telecom NOCs, which had defeated the over-promised global-vendor attempts of the past decade.
Related deployments
Telecom WhatsApp Self-Service
ChatNext deployed across SIM, billing, and bundle management — bilingual and integrated with the carrier's BSS stack — deflecting 44% of inbound contact.
AI Voice Agent for Collections
An Arabic-English voice agent replaced the outbound collections dialler, reducing cost per contact by 58% while increasing same-call promise-to-pay rate.
Voice-First African Self-Service
Voice Bot + ChatNext reached 22M subscribers in 6 local African languages — 48% of voice-based customer interaction now AI-handled.
Want an outcome like this?
Start with a 2-week AI Readiness Sprint. We deliver a prioritised use-case backlog and business case grounded in what's actually buildable with our accelerator library.