Redaction that survives a regulator's audit, not just a screenshot review
Every regulated enterprise has the same problem: thousands of documents a day need to leave one trust boundary for another, and the redaction tool they use is either too crude to be safe or too slow to be practical. Redacto closes that gap with multi-layer detection — regex, machine-learned NER, and an LLM context pass — plus an immutable audit trail granular enough to satisfy the most pedantic GDPR review.
Redacto — in the browser
| ENTITY | COUNT | ACTION |
|---|---|---|
| Name | 4 | Masked |
| MRN | 1 | Masked |
| SSN | 1 | Masked |
| Date of birth | 2 | Masked |
| Address | 3 | Masked |
| Phone / email | 3 | Masked |
What Redacto does
Any format, one pipeline
PDF native and scanned, Microsoft Word, Excel, PowerPoint, HTML, EML and MSG email with attachments, and scanned images all flow through a single ingestion pipeline. No file-type-specific handling required from your integrators — Redacto handles the format detection, OCR where needed, and content extraction internally. Output preserves original formatting with redactions burned-in or marked as overlays per policy.
Forty-eight PII entity types and growing
Names, postal addresses, government identifiers across one hundred ninety-six jurisdictions — Aadhaar, PAN, SSN, NI, BSN, CPF — plus financial account numbers, medical record numbers, healthcare identifiers, biometric references, and behavioural identifiers. Custom entity definitions are no-code and ship alongside the base detectors without merge conflicts.
LLM-grounded context detection
Pattern matching catches the obvious PII. The harder cases — a phone number embedded in a paragraph that could be a reference number, a postal address fragmented across a table row, a date that is contextually a birthdate but textually ambiguous — are routed through an LLM context pass that understands the surrounding text. Recall jumps from the high nineties to the very high nineties with this layer.
Immutable audit trail
Every redaction is logged with the source location, the entity type, the rule or model that detected it, the confidence score, the policy version active at time of processing, and the operator identity if a human reviewed it. The log is append-only, cryptographically signed, and exportable to your SIEM. When the regulator asks 'why was this redacted', you have the answer in seconds.
API-first for any workflow
Synchronous REST API for low-latency interactive use cases, asynchronous batch API for high-volume document processing, webhook callbacks for event-driven workflows, and pre-built connectors for SharePoint, OneDrive, S3, GCS, Azure Blob, and SFTP. Authentication via OIDC or API keys with role-based access scoped per project.
Policy as code
Custom redaction policies authored in a no-code rule editor and version-controlled as YAML. Policy diff and review workflows let compliance approve changes before they reach production. Different policies per document class, jurisdiction, or downstream consumer. The same engine that handles your global default policy handles the bespoke policy for your one Swiss subsidiary.
From start to value in 4 steps
Ingest from anywhere
Documents enter via API call, S3 / GCS / Azure Blob trigger, email inbox, SharePoint connector, or DMS integration. Authentication and access control inherited from your enterprise identity provider.
Multi-layer detection
First pass: deterministic regex and dictionary detectors for high-confidence patterns. Second pass: machine-learned NER models tuned for your jurisdiction and document classes. Third pass: LLM context check on lower-confidence candidates to confirm or reject and to catch contextually-grounded entities the first two passes missed.
Confidence-driven review
Documents with all detections above the high-confidence threshold are auto-redacted and delivered. Documents with any detection in the review range are routed to a human review queue with the suspect entities highlighted in context — reviewer accepts or rejects with one click, and the decision feeds back into the model training set.
Redact and deliver
Original document returned with redactions applied per policy — black-box overlay, replacement with synthetic data, or hash substitution. Structured redaction manifest in JSON returned alongside with every redaction's location, type, and provenance. Original document retained or destroyed per your retention policy.
Built on proven enterprise tech
Detection stack
Ingestion connectors
Compliance attestations
Output and audit
"Compliance review used to be the bottleneck on every client deliverable. Redacto processes our outbound document flow with ninety-nine-point-four percent recall and gives us an audit trail granular enough that GDPR queries take minutes instead of days. We have moved compliance from a blocker to a tail-end check."— Head of Information Security, Global Investment Bank
Deploy how you need it
Redacto — Ready to Deploy
Get a demo and see how it fits your stack.