DPA for AI Agents: What Standard Contracts Miss
Why standard DPAs fall short for enterprise AI infrastructure. With a requirements checklist for HR and compliance teams.
A Data Processing Agreement (DPA) for AI infrastructure must address ten areas that standard SaaS DPAs do not cover: prompt logging policies, environment separation (dev/staging/production), model provider chains, in-flight vs. at-rest data processing, RAG embedding data protection, third-country access to production data, professional privilege compliance, PII tokenization, Decision Layer audit trails, and verifiability of technical measures.
Why Standard DPAs Fail for AI Infrastructure
A standard Data Processing Agreement governs how a service provider handles personal data on behalf of a controller. It defines purpose, data categories, technical and organizational measures, sub-processors, and retention periods. For a CRM system or HR software, that is sufficient.
Enterprise AI infrastructure creates constellations that no standard DPA covers. A language model processes prompts - free-text inputs that can contain anything: personnel data, client information, trade secrets, even special category data under Article 9 GDPR when an employee asks a health-related question. The data flows through a chain of systems: from the frontend through an orchestration layer to the model provider, back into a database, potentially through a RAG pipeline with embedding vectors. At every station, different questions about storage, access, and deletion arise.
When you ask your AI vendor for their DPA and receive a standard document, that should raise a flag. Not because the vendor is untrustworthy, but because the specific risks of AI infrastructure require different provisions than a conventional SaaS application.
Ten Gaps Between Standard DPAs and AI Reality
1. Prompt Content as a New Data Category
Standard DPAs list data categories: name, email, employee number. With AI agents, a new category emerges that cannot be predefined - the prompt. A prompt can be a harmless question about holiday policy or a complete client letter containing personal data. The DPA must establish that the responsibility for content classification lies with the organization, not the AI provider, while the provider must demonstrate technical measures that protect all input content regardless of sensitivity.
2. Logging Policy: What May Appear in Logs?
With conventional software, logging covers technical necessities. With AI infrastructure, logging takes on a different dimension: Are prompt contents logged? Are model responses stored? Are uploaded documents written into error logs?
A robust DPA for AI infrastructure must explicitly establish that production environments have request/response body logging disabled, that only technical metadata is recorded (status codes, latencies, request IDs), that debug logging is deactivated in production, and that stack traces contain no prompt or document content. This sounds obvious - it is not. Ask your vendor about their production logging policy. If they do not have one, that is a warning sign.
3. Environment Separation: Dev, Staging, Production
Every enterprise setup has development, testing, and production environments. With AI infrastructure, separation is security-critical because real data can be misused as test data during development. An AI-specific DPA must establish that dev and staging environments use exclusively synthetic or anonymized data, that production access is restricted to defined roles within the EU/EEA (or your jurisdiction), and that support cases are handled only with test data approved by the organization.
4. Model Provider Chain Instead of Classic Sub-processors
With conventional software, a provider has sub-processors: a hosting provider, perhaps an email service. With AI infrastructure, a three-tier chain emerges: the AI infrastructure provider operates the platform, the model providers (Azure OpenAI, Google Vertex AI, Anthropic) process the prompts, and the platform providers (Supabase, Vercel) host database and frontend.
The critical point: Who is the contractual partner of the model providers? Do the models run in the AI provider’s tenant or in the organization’s tenant? This distinction determines whether the model provider is a sub-processor of the provider or whether the organization manages vendor relationships directly. A strong DPA clarifies this delineation explicitly.
| Component | In Client Tenant | In Provider Tenant |
|---|---|---|
| Azure Entra ID (SSO) | ✓ Client | - |
| Azure OpenAI / Vertex AI | ✓ Client | - |
| Supabase (Database) | ✓ Client | - |
| Vercel (Hosting) | ✓ Client | - |
| Plane (Ticket System) | - | ✓ Provider |
| GitHub (Code Hosting) | - | ✓ Provider |
| Google Workspace (Communication) | - | ✓ Provider |
Components in the client tenant are managed by the client in their own vendor registry. Components in the provider tenant are subject to the sub-processor list in the DPA.
5. In-flight vs. At-rest: Where Does Data Reside?
With conventional software, the question is simple: data resides in a database. With AI infrastructure, two processing modes exist. In-flight processing: the prompt is sent to the model provider, processed, and the response returned - no persistent storage at the provider. At-rest storage: chats, uploads, and embeddings are stored in a database (e.g., Supabase) - persistent and associated with the user.
The DPA must establish separate provisions for both modes. For in-flight: Is content retention at the provider disabled? Is data used for training purposes? For at-rest: Where is the database located? Who has access? How does deletion work?
6. RAG and Embeddings: Vector Data as a New Challenge
Retrieval Augmented Generation (RAG) makes enterprise documents searchable by AI. Documents are converted into embedding vectors and stored in a vector database. These vectors are a new data category: they contain no readable text but can, under certain circumstances, allow inferences about original content. The DPA must treat embeddings as personal data when generated from documents containing personal information. Access control, deletion, and tenant separation must also apply to the vector database.
7. Third-Country Access to Production Data
Many AI providers work with distributed teams. When a developer from a third country has access to the production environment, this constitutes a data transfer under GDPR - even if no data is physically transmitted. An AI-specific DPA must define that production access is restricted to the EU/EEA (or an adequate jurisdiction), that dev/staging access from third countries is permissible (because only synthetic data is present), that RBAC provides separate admin groups for production and dev/staging, and that an exception procedure exists requiring written authorization from the controller.
(UK: Post-Brexit, the UK operates its own data protection framework (UK GDPR). The adequacy decision from the EU covers most scenarios, but organizations should verify whether their specific AI provider setup requires additional provisions.)
(US: Organizations subject to US state privacy laws (CCPA/CPRA, Colorado Privacy Act, etc.) should map these DPA requirements against their applicable framework. The concept of “processor” vs. “service provider” varies by jurisdiction.)
8. Professional Privilege in AI Platforms
For regulated industries - law firms, tax advisors, auditors, healthcare organizations, and enterprises with processes subject to professional secrecy - AI processing requires explicit DPA provisions. In Germany, §203 StGB (professional secrecy) requires written confidentiality commitments for all personnel with access. The EU-wide equivalent is the professional privilege doctrine combined with sector-specific regulations. (UK: Legal Professional Privilege and medical confidentiality apply.) Standard confidentiality clauses are not sufficient for these contexts.
9. PII Tokenization as an Optional Module
Input and output filters that detect personal data and pseudonymize it before forwarding to the model provider add an additional layer of protection. This PII tokenization is not necessary in every setup, but the DPA should provide for it as an optional module. Important: if reversible re-identification is possible, the DPA must govern who has access to the mapping table, how mapping keys are stored and encrypted, and that re-identification only occurs for a defined purpose after documented authorization.
10. Audit Trail and Decision Layer
AI agents make or prepare decisions. Every one of these decisions must be traceable - not only for data protection, but also for internal audit, external auditors, and employee representative bodies. The Decision Layer decomposes every process into individual decision steps and defines for each step: human, rule engine, or AI. The DPA must anchor the audit trail as a contractual component: What data is logged per decision? How long is it retained? Who has access?
The Checklist: 25 Questions for Your AI Vendor
The following checklist translates the ten gaps into concrete verification questions.
View requirements checklist: 25 verification questions for AI DPAs →
A - Data Categories and Processing Purposes
- Are prompt contents and model responses listed as distinct data categories in the DPA?
- Is it established that content classification responsibility lies with the organization, not the provider?
- Are embeddings/vectors classified as potentially personal data?
- Does the DPA address special categories under Article 9 GDPR that may arise through user inputs?
B - Logging and Monitoring
- Is request/response body logging disabled in the production environment?
- What metadata is recorded (status codes, latencies, request IDs)?
- Is debug logging verifiably disabled in production?
- Are stack traces and error messages configured to exclude content data from logs?
- Is verification of logging settings part of the release process?
C - Environment Separation and Access
- Do separate environments exist (dev, staging, production) with distinct data policies?
- Do dev/staging environments contain exclusively synthetic or anonymized data?
- Is production access restricted to authorized roles within the EU/EEA?
- Does a documented exception procedure exist for support cases involving data?
D - Model Providers and Sub-processors
- Is the delineation clear: Which providers are sub-processors of the provider, and which operate in the organization’s tenant?
- Is content retention at model providers disabled?
- Is the exclusion of training data usage contractually documented?
- Where are the model endpoints located (EU region, US, other)?
E - Data Storage and Deletion
- Is it established where persistent content data is stored (database, region, provider)?
- What backup retention applies, and how are deleted data handled within backups?
- Can individual users delete their own data within the application?
F - Regulated Industries
- Does the DPA contain provisions for professional privilege compliance (§203 StGB or jurisdiction-specific equivalent)?
- Are confidentiality commitments in place for all personnel with access?
- Is PII tokenization available as an optional module?
G - Governance and Verifiability
- Is an audit trail for agent decisions anchored as a contractual component?
- Can technical and organizational measures be evidenced on request (configuration documentation, redacted log excerpts)?
From Checklist to Architecture
These 25 questions are not a legal tool. They are an architecture compass. Every question your AI vendor cannot answer reveals a gap in their infrastructure - not just in their contract.
At Gosign, we have built these requirements into the architecture: environment separation with production lockdown, logging without content data, model providers in the client tenant, Decision Layer with complete audit trail. Not because a client demanded it, but because enterprise AI without these foundations is not auditable.
If you want to assess how your current AI infrastructure scores against these 25 points - or if you are evaluating a new platform - we are happy to discuss.