NEWMindMap Digital has acquired Bluetide.co— deepening our data & agentic-AI stack.Read more →
Home · Products · Redacto
Document Intelligence · PII Protection

Redaction that survives a regulator's audit, not just a screenshot review

Every regulated enterprise has the same problem: thousands of documents a day need to leave one trust boundary for another, and the redaction tool they use is either too crude to be safe or too slow to be practical. Redacto closes that gap with multi-layer detection — regex, machine-learned NER, and an LLM context pass — plus an immutable audit trail granular enough to satisfy the most pedantic GDPR review.

99.4%
Redaction recall on PII
48
Entity types out of the box
<1.8s
Median per-page latency
SOC 2 II
ISO 27001 certified
The product

Redacto — in the browser

redact.mindmap.local · Redacto⌘K
14
PII ENTITIES
99.2%
RECALL
0
DATA EGRESS
2,108
DOCS TODAY
DOCUMENT · patient_record_2291.pdf14 ENTITIES REDACTED
ENTITYCOUNTACTION
Name4Masked
MRN1Masked
SSN1Masked
Date of birth2Masked
Address3Masked
Phone / email3Masked
99.4%
PII recall
98.6%
PII precision
<1.8s
Per-page latency
6%
Manual review burden
Capabilities

What Redacto does

Any format, one pipeline

PDF native and scanned, Microsoft Word, Excel, PowerPoint, HTML, EML and MSG email with attachments, and scanned images all flow through a single ingestion pipeline. No file-type-specific handling required from your integrators — Redacto handles the format detection, OCR where needed, and content extraction internally. Output preserves original formatting with redactions burned-in or marked as overlays per policy.

Forty-eight PII entity types and growing

Names, postal addresses, government identifiers across one hundred ninety-six jurisdictions — Aadhaar, PAN, SSN, NI, BSN, CPF — plus financial account numbers, medical record numbers, healthcare identifiers, biometric references, and behavioural identifiers. Custom entity definitions are no-code and ship alongside the base detectors without merge conflicts.

48 entity types out of box

LLM-grounded context detection

Pattern matching catches the obvious PII. The harder cases — a phone number embedded in a paragraph that could be a reference number, a postal address fragmented across a table row, a date that is contextually a birthdate but textually ambiguous — are routed through an LLM context pass that understands the surrounding text. Recall jumps from the high nineties to the very high nineties with this layer.

Immutable audit trail

Every redaction is logged with the source location, the entity type, the rule or model that detected it, the confidence score, the policy version active at time of processing, and the operator identity if a human reviewed it. The log is append-only, cryptographically signed, and exportable to your SIEM. When the regulator asks 'why was this redacted', you have the answer in seconds.

API-first for any workflow

Synchronous REST API for low-latency interactive use cases, asynchronous batch API for high-volume document processing, webhook callbacks for event-driven workflows, and pre-built connectors for SharePoint, OneDrive, S3, GCS, Azure Blob, and SFTP. Authentication via OIDC or API keys with role-based access scoped per project.

Policy as code

Custom redaction policies authored in a no-code rule editor and version-controlled as YAML. Policy diff and review workflows let compliance approve changes before they reach production. Different policies per document class, jurisdiction, or downstream consumer. The same engine that handles your global default policy handles the bespoke policy for your one Swiss subsidiary.

How It Works

From start to value in 4 steps

01

Ingest from anywhere

Documents enter via API call, S3 / GCS / Azure Blob trigger, email inbox, SharePoint connector, or DMS integration. Authentication and access control inherited from your enterprise identity provider.

02

Multi-layer detection

First pass: deterministic regex and dictionary detectors for high-confidence patterns. Second pass: machine-learned NER models tuned for your jurisdiction and document classes. Third pass: LLM context check on lower-confidence candidates to confirm or reject and to catch contextually-grounded entities the first two passes missed.

03

Confidence-driven review

Documents with all detections above the high-confidence threshold are auto-redacted and delivered. Documents with any detection in the review range are routed to a human review queue with the suspect entities highlighted in context — reviewer accepts or rejects with one click, and the decision feeds back into the model training set.

04

Redact and deliver

Original document returned with redactions applied per policy — black-box overlay, replacement with synthetic data, or hash substitution. Structured redaction manifest in JSON returned alongside with every redaction's location, type, and provenance. Original document retained or destroyed per your retention policy.

Technology

Built on proven enterprise tech

Detection stack
Microsoft Presidio
spaCy NER
Custom transformer NER
Azure Document Intelligence
GPT-4o context pass
Domain fine-tunes
Ingestion connectors
SharePoint Online
OneDrive for Business
Salesforce Files
S3 / GCS / Azure Blob
SFTP and email gateway
REST and webhook API
Compliance attestations
SOC 2 Type II
ISO 27001
GDPR Article 25
HIPAA Safe Harbor
DPDP Act India
FedRAMP Moderate equivalent
Output and audit
Redacted PDF/A
Synthetic-data replacement
Hash substitution
Structured manifest JSON
Immutable audit log
SIEM webhook
"Compliance review used to be the bottleneck on every client deliverable. Redacto processes our outbound document flow with ninety-nine-point-four percent recall and gives us an audit trail granular enough that GDPR queries take minutes instead of days. We have moved compliance from a blocker to a tail-end check."
Head of Information Security, Global Investment Bank
Deployment

Deploy how you need it

Managed SaaS

Hosted by MindMap on SOC 2 Type II and ISO 27001 certified infrastructure with regional data residency, multi-tenant isolation, and a documented sub-processor list. Zero infrastructure burden on your side, live in twenty-four hours after contract. Best for organisations that need rapid deployment and a defensible compliance posture without internal hosting investment.

Private cloud

Deployed in your AWS, Azure, or GCP tenant. You retain full data residency, key management, and tenancy control; MindMap operates the stack. Integrated with your identity provider, your SIEM, and your existing observability. Typical setup is two weeks from contract.

On-prem and air-gapped

Full deployment inside your data centre with no outbound internet access. The OCR engines, redaction engine, LLM context model, audit store, and management console all run on your hardware. Suitable for central banks, defence agencies, and regulators with absolute data-sovereignty mandates. Documented upgrade and patch path with no internet dependency.

Redacto — Ready to Deploy

Get a demo and see how it fits your stack.

Book a demo →Explore the full library →
Talk to the product team