Document Intelligence · PII Protection

Redaction that survives a regulator's audit, not just a screenshot review

Every regulated enterprise has the same problem: thousands of documents a day need to leave one trust boundary for another, and the redaction tool they use is either too crude to be safe or too slow to be practical. Redacto closes that gap with multi-layer detection — regex, machine-learned NER, and an LLM context pass — plus an immutable audit trail granular enough to satisfy the most pedantic GDPR review.

Book a demo →Get pricing →

99.4%

Redaction recall on PII

Entity types out of the box

<1.8s

Median per-page latency

SOC 2 II

ISO 27001 certified

The product

Redacto — in the browser

redact.mindmap.local · Redacto⌘K

PII ENTITIES

99.2%

RECALL

DATA EGRESS

2,108

DOCS TODAY

DOCUMENT · patient_record_2291.pdf14 ENTITIES REDACTED

ENTITY	COUNT	ACTION
Name	4	Masked
MRN	1	Masked
SSN	1	Masked
Date of birth	2	Masked
Address	3	Masked
Phone / email	3	Masked

99.4%

PII recall

98.6%

PII precision

<1.8s

Per-page latency

Manual review burden

Capabilities

What Redacto does

Any format, one pipeline

PDF native and scanned, Microsoft Word, Excel, PowerPoint, HTML, EML and MSG email with attachments, and scanned images all flow through a single ingestion pipeline. No file-type-specific handling required from your integrators — Redacto handles the format detection, OCR where needed, and content extraction internally. Output preserves original formatting with redactions burned-in or marked as overlays per policy.

Forty-eight PII entity types and growing

Names, postal addresses, government identifiers across one hundred ninety-six jurisdictions — Aadhaar, PAN, SSN, NI, BSN, CPF — plus financial account numbers, medical record numbers, healthcare identifiers, biometric references, and behavioural identifiers. Custom entity definitions are no-code and ship alongside the base detectors without merge conflicts.

48 entity types out of box

LLM-grounded context detection

Pattern matching catches the obvious PII. The harder cases — a phone number embedded in a paragraph that could be a reference number, a postal address fragmented across a table row, a date that is contextually a birthdate but textually ambiguous — are routed through an LLM context pass that understands the surrounding text. Recall jumps from the high nineties to the very high nineties with this layer.

Immutable audit trail

Every redaction is logged with the source location, the entity type, the rule or model that detected it, the confidence score, the policy version active at time of processing, and the operator identity if a human reviewed it. The log is append-only, cryptographically signed, and exportable to your SIEM. When the regulator asks 'why was this redacted', you have the answer in seconds.

API-first for any workflow

Synchronous REST API for low-latency interactive use cases, asynchronous batch API for high-volume document processing, webhook callbacks for event-driven workflows, and pre-built connectors for SharePoint, OneDrive, S3, GCS, Azure Blob, and SFTP. Authentication via OIDC or API keys with role-based access scoped per project.

Policy as code

Custom redaction policies authored in a no-code rule editor and version-controlled as YAML. Policy diff and review workflows let compliance approve changes before they reach production. Different policies per document class, jurisdiction, or downstream consumer. The same engine that handles your global default policy handles the bespoke policy for your one Swiss subsidiary.

How It Works

From start to value in 4 steps

Ingest from anywhere

Documents enter via API call, S3 / GCS / Azure Blob trigger, email inbox, SharePoint connector, or DMS integration. Authentication and access control inherited from your enterprise identity provider.

Multi-layer detection

First pass: deterministic regex and dictionary detectors for high-confidence patterns. Second pass: machine-learned NER models tuned for your jurisdiction and document classes. Third pass: LLM context check on lower-confidence candidates to confirm or reject and to catch contextually-grounded entities the first two passes missed.

Confidence-driven review

Documents with all detections above the high-confidence threshold are auto-redacted and delivered. Documents with any detection in the review range are routed to a human review queue with the suspect entities highlighted in context — reviewer accepts or rejects with one click, and the decision feeds back into the model training set.

Redact and deliver

Original document returned with redactions applied per policy — black-box overlay, replacement with synthetic data, or hash substitution. Structured redaction manifest in JSON returned alongside with every redaction's location, type, and provenance. Original document retained or destroyed per your retention policy.

Technology

Built on proven enterprise tech

Detection stack

Microsoft Presidio

spaCy NER

Custom transformer NER

Azure Document Intelligence

GPT-4o context pass

Domain fine-tunes

Ingestion connectors

SharePoint Online

OneDrive for Business

Salesforce Files

S3 / GCS / Azure Blob

SFTP and email gateway

REST and webhook API

Compliance attestations

SOC 2 Type II

ISO 27001

GDPR Article 25

HIPAA Safe Harbor

DPDP Act India

FedRAMP Moderate equivalent

Output and audit

Redacted PDF/A

Synthetic-data replacement

Hash substitution

Structured manifest JSON

Immutable audit log

SIEM webhook

"Compliance review used to be the bottleneck on every client deliverable. Redacto processes our outbound document flow with ninety-nine-point-four percent recall and gives us an audit trail granular enough that GDPR queries take minutes instead of days. We have moved compliance from a blocker to a tail-end check."

— Head of Information Security, Global Investment Bank

Deployment

Deploy how you need it

Managed SaaS

Hosted by MindMap on SOC 2 Type II and ISO 27001 certified infrastructure with regional data residency, multi-tenant isolation, and a documented sub-processor list. Zero infrastructure burden on your side, live in twenty-four hours after contract. Best for organisations that need rapid deployment and a defensible compliance posture without internal hosting investment.

Private cloud

Deployed in your AWS, Azure, or GCP tenant. You retain full data residency, key management, and tenancy control; MindMap operates the stack. Integrated with your identity provider, your SIEM, and your existing observability. Typical setup is two weeks from contract.

On-prem and air-gapped

Full deployment inside your data centre with no outbound internet access. The OCR engines, redaction engine, LLM context model, audit store, and management console all run on your hardware. Suitable for central banks, defence agencies, and regulators with absolute data-sovereignty mandates. Documented upgrade and patch path with no internet dependency.

Redacto — Ready to Deploy

Get a demo and see how it fits your stack.

Book a demo →Explore the full library →