Skip to content

Document Intelligence

PII Anonymization. Contract Redaction. Signature Detection.

Process documents with AI without exposing personal data. Roundtrip pseudonymization for LLM input, rule-based redaction for controlled sharing, automated signature detection for contract management. GDPR-compliant by architecture.

The Problem: Personal Data in Every Document

Enterprises want to process documents with AI – analyze contracts, classify invoices, query policies. But every document contains personal data: names, salaries, social security numbers, addresses, bank details, signatures.

Sending this data to a language model – even a self-hosted one – without protection violates GDPR principles of data minimization. Works council agreements (Betriebsvereinbarungen) restrict processing of employee data. Trade secrets in contracts must not reach third parties.

Current approaches fall short: Manual redaction in Adobe Acrobat is time-consuming, error-prone, and often only cosmetic – the text remains accessible beneath the black bars. Alternatively, avoiding AI processing for sensitive documents eliminates most of the productivity gain.

Three Capabilities

PII Anonymization for LLM Input

Roundtrip pseudonymization: personal data is replaced with consistent pseudonyms before LLM input. The output is re-anonymized – real data appears only in the result, never in the model. The mapping table never leaves the pre-processing layer.

Details

Contract Redaction

Rule-based redaction for different recipients. The same contract is redacted differently for works councils, due diligence, or external advisors – governed by versioned redaction rules in the Decision Layer. Physical redaction, not just visual overlay.

Details

Signature Detection

Automated detection of signature fields and present signatures in documents. Bulk verification of contract archives, onboarding quality checks, audit preparation. Anomalies are escalated to humans – never autonomously accepted.

Details

PII Anonymization: Roundtrip Pseudonymization for LLM Input

Most PII tools on the market perform one-way redaction – they remove data. For processing with language models, that is insufficient. When an agent needs to analyze a contract, it requires context: "Employee X has salary Y at location Z." Without this context, the model cannot produce a meaningful assessment.

The Gosign approach is roundtrip pseudonymization: data is pseudonymized before the model, processed by the model, and re-anonymized in the result. The model only sees pseudonyms. The result contains the real data.

┌─────────────┐     ┌──────────────────┐     ┌─────────────┐     ┌──────────────────┐     ┌─────────────┐
│  Document   │     │  PII Detection   │     │  Pseudonym-  │     │  Language Model  │     │  Re-Mapping │
│  (Original) │────▶│  & Classifi-     │────▶│  ization     │────▶│  processes only  │────▶│  Pseudonyms │
│             │     │  cation          │     │              │     │  pseudonyms      │     │  → real data│
└─────────────┘     └──────────────────┘     └──────────────┘     └──────────────────┘     └─────────────┘
                           │                        │                                            │
                           ▼                        ▼                                            ▼
                    ┌──────────────┐         ┌──────────────┐                              ┌──────────────┐
                    │  Decision    │         │  Mapping     │                              │  Result      │
                    │  Layer:      │         │  Table       │◀─────────────────────────────│  with real   │
                    │  What gets   │         │  (stays      │   Reverse mapping           │  data        │
                    │  anonymized  │         │  local)      │                              └──────────────┘
                    └──────────────┘         └──────────────┘

Decision Steps in the PII Process

Micro-Decision Who Decides Why
Define PII categoriesHuman + RulesetGDPR requirements, works council agreement, client-specific rules
Detect PII in documentAI (NER + patterns)Named Entity Recognition + rule-based patterns
Review false positivesAI; human when uncertainConfidence routing – "Baker" as surname or occupation?
Assign pseudonymsAutomaticConsistent mapping, "Person_A" instead of "John Smith"
Send pseudonymized document to modelAutomaticNo decision, pure forwarding
Re-anonymize outputAutomaticApply mapping table in reverse
Audit: what was anonymizedAutomaticGDPR evidence in audit trail

The mapping table (pseudonym → real data) never leaves the pre-processing layer. It is deleted after processing is complete – or retained for a defined period, depending on configuration. The language model never sees personal data at any point.

Contract Redaction: Rule-Based, Recipient-Dependent, Physical

Contracts regularly need to be shared in redacted form – with auditors, potential buyers during due diligence, with works councils (Betriebsrat), with external advisors. Today, someone does this manually. It takes hours per contract, is error-prone, and the redaction is often only cosmetic: the text remains accessible beneath the black bars. A frequently underestimated data leak.

The Gosign approach: the Document Agent recognizes contract structure – parties, amounts, terms, clauses, signatures. The Decision Layer defines recipient-dependent redaction rules:

Contract Element Works Council Due Diligence External Advisor Auditor
Contracting parties (names)✓ Visible✗ Redacted✗ Redacted✓ Visible
Contract values / amounts✓ Visible✓ Visible✗ Redacted✓ Visible
Salaries / compensation✓ VisibleAggregated✗ Redacted✓ Visible
Contract clauses✓ Visible✓ VisibleClause types only✓ Visible
Trade secrets✗ Redacted✓ Visible✗ Redacted✓ Visible
Signatures✗ Redacted✗ Redacted✗ Redacted✓ Visible

Redaction rules are versioned in the Decision Layer. When requirements change – new recipient group, updated works council agreement (Betriebsvereinbarung), changed compliance rule – a new rule version is created. The previous version remains traceable.

Physical redaction: The PDF is re-rendered from scratch. The original data is physically no longer present in the document – not as text, not as metadata, not as an invisible layer. No copy-paste beneath black bars, no PDF editing to uncover content. This is not cosmetic – it is cryptographically clean.

Signature Detection: Find, Verify, Document

Contract management, audit preparation, compliance reviews – all require regular verification: Is this document signed? Where is the signature? Is a countersignature missing? With 5,000 contracts in the archive, manual checking is not feasible.

Signature Detection – Finding Signatures

The Document Agent detects signature fields and present signatures in scanned documents and PDFs. Computer vision, not a language model – specialized ML models for image analysis. The output is structured: page, position, confidence that a signature is present.

Bulk archive verification: "Which of the 5,000 contracts are missing a countersignature?" – Results in minutes instead of weeks.

Onboarding quality check: "Are all mandatory documents for the new employee signed?" – Automated checklist, missing signatures escalated as workflow tasks.

Audit preparation: "Show all documents without a signature in Q3 2025." – Structured export list for the auditor.

┌─────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  Document   │     │  Signature       │     │  Comparison      │
│  with       │────▶│  Detection       │────▶│  against         │
│  signature  │     │  (position,      │     │  reference       │
│             │     │   confidence)    │     │  signature       │
└─────────────┘     └──────────────────┘     └──────────────────┘
                                                      │
                                          ┌───────────┼───────────┐
                                          ▼           ▼           ▼
                                   ┌────────────┐ ┌────────┐ ┌────────────┐
                                   │  High      │ │ Medium │ │  Low       │
                                   │  match     │ │ match  │ │  match     │
                                   └────────────┘ └────────┘ └────────────┘
                                        │              │           │
                                        ▼              ▼           ▼
                                   Automatically  Escalation   Blocked
                                   accepted,      to human     Human
                                   documented     with side-   review
                                                  by-side      mandatory
                                                  comparison
                                                  view

Important: Signature comparison is an anomaly detector, not a forgery detector. Signatures vary naturally – depending on the day, pen, and surface. The system identifies anomalies and escalates them to a human. It never claims "this signature is forged" or "this signature is authentic." That would be irresponsible.

The Decision Layer: Who Decides What Gets Anonymized, Redacted, or Escalated?

The Decision Layer decomposes every document process into individual decision steps. For each step, it defines: human, ruleset, or AI.

Process Micro-Decision Who Decides Why
PIIWhich data fields are PII?RulesetGDPR Art. 4, works council agreement
PIIIs "Baker" a name or an occupation?AI; human at <80% confidenceNER ambiguity – avoid false positives
PIIChoose pseudonymization methodRulesetConsistent pseudonyms vs. random values
RedactionWhich recipient group?HumanDomain decision, not automatable
RedactionWhich fields are redacted?RulesetRecipient-dependent redaction matrix
RedactionUnknown clause type detectedHumanNew clause types must be classified
SignatureSignature present?AIComputer vision with confidence score
SignatureDoes signature match reference?AI + human on anomalyHigh match: accepted. Anomaly: escalated
SignatureNo reference availableHumanNew reference signature must be captured
AllDocument audit trailAutomaticEvery decision immutably recorded

Integration

Document Intelligence is a capability of the existing Document Agent – not separate software. Integration uses the same standardized interfaces:

  • SAP DMS, SAP ArchiveLink – contracts and receipts from SAP archives
  • SharePoint, OneDrive – document management via Microsoft Graph
  • Email inboxes (IMAP/Exchange) – process attachments automatically
  • File system watchers – monitor local directories
  • REST API – for client-specific DMS systems

Document Intelligence capabilities are configured per tenant: which PII categories are detected, which redaction rules apply, which reference signatures are stored. All versioned, all in the Decision Layer.

Business Impact

GDPR-compliant LLM processing: Documents containing personal data can be securely processed with language models for the first time – without privacy risk.

Contract redaction in minutes instead of hours: Rule-based, recipient-dependent, physically secure. A contract that takes 2 hours manually is processed in seconds.

Proactive signature gap detection: Missing signatures are found before the auditor asks – not after.

Audit evidence for data protection: The audit trail documents every anonymization, every redaction, every signature check. During a GDPR inquiry or tax audit, it is provable which data was processed when and how.

No new tool: Document Intelligence is part of the existing agent architecture. No additional vendor, no additional license, no additional training.

Frequently Asked Questions

What is the difference between anonymization and pseudonymization?

Anonymization irreversibly removes personal data – the link to the individual is permanently destroyed. Pseudonymization replaces the data with pseudonyms while the mapping remains possible through a separate table. For LLM processing, we use pseudonymization with subsequent re-anonymization: the model sees only pseudonyms, the result contains the real data again.

Does PII detection work with scanned documents?

Yes. Scanned documents are first converted to machine-readable text via OCR. The text then undergoes the same PII detection as digital documents. Detection accuracy depends on scan quality – at standard scans (300 DPI), OCR accuracy exceeds 99%.

Is the contract redaction truly secure?

Yes. Unlike manual redaction in PDF editors, the document is physically re-rendered. The redacted content is no longer present in the document – neither as text, nor as metadata, nor as invisible layers. This is cryptographically verifiable.

Can signature comparison detect forgeries?

Signature comparison detects anomalies – deviations from a reference signature. When anomalies are found, the system automatically escalates to a human. It never claims a signature is forged or authentic. That decision is made by a human. This is the only responsible approach.

Which documents should be processed securely?

PII anonymization, contract redaction, or signature detection – we start with one specific document type.

Book a Meeting