Is using cloud AI APIs GDPR-compliant?

Cloud AI APIs can be used in a GDPR-compliant way when the provider guarantees EU data residency and a data processing agreement is in place. Claude (Anthropic), GPT (Azure OpenAI) and Gemini (Google) offer EU regions. For sensitivity level 3 - 4 data, we recommend self-hosting.

What does self-hosting AI models cost?

gpt-oss-120b runs on a single GPU (80 GB) - approximately €1,200/month at a German hosting provider. Larger models like Llama 4 Maverick require 4+ GPUs, approximately €3,500/month.

What is the hybrid strategy?

The hybrid architecture automatically routes requests by data sensitivity: public data via cloud APIs (fast, affordable), confidential data via self-hosted models (no data egress). A routing layer decides automatically.

AI Hosting: EU SaaS, German Data Center, or Self-Hosted?

“Where Does It Run?”: The Decisive Question

Before you select a model, before you build agents, before you roll out an interface, there is one question: where do your AI models run? This decision determines the data protection guarantees you can offer, the regulatory requirements you can meet, how high your ongoing costs are, and how dependent you become on third-party providers.

At a Glance - AI Hosting Strategy for Enterprise

Three tiers: EU SaaS (cloud APIs), European IaaS (self-hosted GPU), and on-premises (own hardware) - plus the hybrid combination.
The hybrid architecture routes requests by data sensitivity: 60-70% cloud, 25-35% European IaaS, 5-10% on-premises.
Gartner (2025) estimates that 40% of large enterprises will operate hybrid AI hosting architectures by 2027, up from under 10% in 2024.
Self-hosted open-source models (gpt-oss-120b) run on a single GPU for approximately EUR 1,200 per month at European providers.
Total costs with hybrid strategy are 30-40% below a pure cloud-API approach while delivering higher data sovereignty.

There are three fundamental strategies, and a fourth that has become the standard in practice: the hybrid architecture that combines all three.

Tier 1: EU SaaS, Cloud APIs with EU Data Residency

The simplest and fastest option: you use the model providers’ APIs directly. Claude via the Anthropic API (EU region), GPT-5.2 via Azure OpenAI (EU data center), Gemini via Google Cloud Platform (EU region). Data leaves your network but is processed in EU data centers.

Advantages

Fastest start: No infrastructure to build, no GPU servers to provision, no ML-Ops expertise required. Set up an API key, sign a data processing agreement, go live in hours.

Automatic updates: Model updates, security patches, and performance improvements are rolled out by the provider. No maintenance effort on your side.

Scalability: No capacity management. During load spikes, the cloud provider scales automatically. No over-provisioning, no under-capacity.

Model variety: Access to all model variants from the provider, flagship, balanced, and budget, through the same API.

Risks and Limitations

Data leaves the corporate network. Even with EU data residency, your requests are processed on infrastructure you do not control. The provider has technical access to data during processing.

CLOUD Act. US-based providers, including Anthropic, OpenAI, and Google, are subject to the US CLOUD Act. Under certain conditions, US authorities can request access to data even when it is stored in EU data centers. For most corporate data, this risk is assessable and acceptable. For trade secrets, classified data, or critical infrastructure information, it is not.

Vendor dependency. With a single-provider strategy, you are dependent on one provider’s pricing, API changes, and availability. A model-agnostic architecture (see AI Models Comparison 2026) reduces this risk.

DPA required. GDPR-compliant use requires a data processing agreement (DPA) with the provider. All three major providers offer standard DPAs. Review them with your legal department. Note: Standard SaaS DPAs do not cover AI-specific topics such as prompt logging, environment separation, and model provider chains. Our DPA requirements checklist for AI infrastructure identifies the ten gaps and provides 25 verification questions.

Suited For

Standard tasks with non-sensitive data: summaries, translations, general question answering
Proof of concepts and pilot projects
Tasks with variable volume where dedicated GPU infrastructure would be uneconomical
Organizations without ML-Ops expertise that want to go live quickly

Tier 2: European IaaS, GPU Hosting at European Providers

The middle ground: you rent GPU servers from a European Infrastructure-as-a-Service provider, such as Hetzner, IONOS, or a specialized GPU cloud provider. On these servers, you operate open-source models like gpt-oss, Llama 4, or Mistral Medium 3.1 yourself.

Specific Hardware Requirements and Costs

Model	GPU Requirement	Estimated Cost/Month
gpt-oss-120b	1x A100/H100 (80 GB)	approx. EUR 1,200
gpt-oss-20b	CPU/16 GB RAM (or small GPU)	approx. EUR 200—400
Llama 4 Scout	1x A100 (80 GB)	approx. EUR 1,200
Llama 4 Maverick	4x A100 (80 GB)	approx. EUR 3,500
Mistral Medium 3.1	4x A100 (80 GB)	approx. EUR 3,500

Advantages

Data stays in Europe. The server sits in a European data center, operated by a European provider. No CLOUD Act, no transatlantic data transfer. For GDPR compliance, this is the most secure cloud option.

No vendor lock-in. You operate open-source models under Apache 2.0 or the Meta Llama License. If you want to switch hosting providers, you migrate the model. No license questions, no contract negotiations.

Full model control. You decide which model in which version runs. You can fine-tune, quantize, or replace models with newer versions without waiting for the provider.

Predictable costs. GPU servers have fixed monthly costs. No variable token charges, no surprises during load spikes. For organizations with high, constant volume, this is often more economical than cloud APIs.

Requirements

ML-Ops competency. You need someone to deploy, monitor, update, and troubleshoot the model. This can be an internal ML engineer or an external service provider, but it is not zero effort.

Capacity planning. A GPU server has a defined capacity. If you have 500 concurrent requests, a single GPU will not suffice. You must understand load profiles and plan capacity accordingly.

No automatic updates. When a new model is released, you deploy it yourself. When a security issue arises, you patch it yourself.

Suited For

Confidential corporate data (sensitivity level 2—3)
Organizations that must eliminate CLOUD Act risks
Use cases with constant, high volume (cost advantage over cloud APIs)
Organizations with existing DevOps/ML-Ops competency

Tier 3: On-Premises, AI on Your Own Hardware

The maximum-control option: you operate GPU servers in your own data center or in a colocation rack. No data leaves your network, under any circumstances.

Advantages

Maximum data sovereignty. No external access, no external provider, no external dependency. The hardware is yours, the model is yours, the data never leaves your network.

Regulatory certainty. For critical infrastructure operators, government agencies, defense, and organizations with classified data, on-premises is often the only option that meets compliance requirements.

No recurring license or API costs. After the initial investment, only electricity, cooling, and maintenance remain. Over long-term operation at high volume, on-premises can be the most economical option.

Challenges

High initial investment. A production GPU server with an NVIDIA H100 (80 GB) costs EUR 25,000—40,000. For more capable setups (multi-GPU, redundancy), costs range from EUR 60,000 to EUR 120,000 or more.

ML-Ops team required. On-premises means you are responsible for everything: hardware maintenance, model deployment, monitoring, updates, security. This requires a dedicated team or an experienced service provider.

Scaling is not trivial. When demand increases, you cannot add another GPU at the push of a button. Hardware procurement takes weeks to months.

Suited For

Critical infrastructure operators and government agencies
Classified data and highest confidentiality levels
Organizations with their own data center and ML-Ops competency
Long-term investment willingness at very high volume

Free eBook: AI Infrastructure

Build, Buy, Hybrid - EU AI Act-compliant infrastructure with B/B/H Framework and 7-Layer Reference Architecture.

Download for free

The Decision Tree

The following decision logic helps with the assignment:

Does your data contain PII or trade secrets?
├── NO → EU SaaS (Tier 1)
└── YES → Critical infrastructure or classified data?
├── YES → On-Premises (Tier 3)
└── NO → European IaaS (Tier 2) or Hybrid

In practice, the answer is rarely a single tier. Most organizations have data of varying sensitivity and therefore need an architecture that covers all tiers.

Hybrid as the Standard: The Routing Architecture

The hybrid strategy combines all three tiers in a single architecture. A routing layer automatically decides which request goes through which channel, based on data sensitivity, not on individual employee decisions.

How the Routing Works

Data sensitivity level 1—2 (public, internal): Requests go through cloud APIs. Fast, affordable, scalable. Example: summarizing a public whitepaper, translating a press release, drafting a general email.

Data sensitivity level 3 (confidential): Requests are routed to self-hosted models in the European data center. No data egress, no CLOUD Act. Example: analyzing internal contracts, processing HR data, evaluating confidential financial data.

Data sensitivity level 4 (strictly confidential / regulated): Requests run exclusively on on-premises infrastructure. Example: classified documents, critical infrastructure systems, data under special confidentiality obligations.

Prerequisite: Data Classification

For routing to work, the organization must classify its data. This sounds onerous but is already in place at many organizations, for example, as part of existing Information Security Management Systems (ISMS) or national security classification frameworks. The routing rules map this existing classification to the AI infrastructure.

Technical Implementation

The routing layer sits between the Enterprise AI Portal (the interface employees use) and the model endpoints. It consists of three components:

Classifier: Automatically detects the data sensitivity of a request, based on keywords, source system, or explicit user marking.
Routing Engine: Assigns the request to the appropriate model endpoint: cloud API, European IaaS, or on-premises.
Audit Log: Records every routing decision: which request, which sensitivity level, which endpoint. Traceable and exportable.

Cost Effect

The hybrid architecture optimizes not only data security but also costs. Cloud APIs are inexpensive per request but variable. Self-hosted models have fixed costs that amortize at high volume. The combination leverages both: affordable cloud APIs for the bulk of non-sensitive requests, fixed-cost self-hosted models for confidential volume.

In practice, we see the following distribution at organizations with 1,000+ employees: 60—70% of requests go through cloud APIs (tier 1—2), 25—35% through European IaaS (tier 3), and 5—10% through on-premises (tier 4). Total costs are 30—40% below a pure cloud-API strategy while delivering higher data sovereignty.

Summary: The Three Tiers at a Glance

Criterion	EU SaaS (Tier 1)	European IaaS (Tier 2)	On-Premises (Tier 3)
Data Sovereignty	EU region, DPA	Europe, no CLOUD Act	Maximum
Initial Cost	None	Low (rental)	High (EUR 60—120K+)
Ongoing Cost	Variable (token)	Fixed (GPU rental)	Fixed (power, maintenance)
ML-Ops Effort	None	Medium	High
Scalability	Automatic	Manual	Manual, slow
Suited For	Level 1—2 data	Level 2—3 data	Level 3—4 data

The right strategy is almost always a combination. Gosign implements the routing layer that connects all three tiers, so your employees use a single interface and the system automatically selects the right path.

Further reading: AI Infrastructure | Decision Layer & Shadow AI

📘 Enterprise AI Infrastructure Blueprint 2026 - Article Series

← Previous	Overview	Next →
AI Models 2026: Which Model for Which Use Case?	Overview	Enterprise AI Portal: Four Open-Source Interfaces Compared

All articles in this series: Enterprise AI Infrastructure Blueprint 2026

Want to know which hosting strategy is right for your data landscape? Gosign analyzes your data classification and designs the appropriate hybrid architecture.

Book a consultation. We clarify in 30 minutes which hosting tiers you need.

Bert Gogolin

CEO & Founder, Gosign

AI Governance Briefing

Enterprise AI, regulation, and infrastructure - once a month, directly from me.

AI Hosting: EU SaaS, German Data Center, or Self-Hosted?

“Where Does It Run?”: The Decisive Question

Tier 1: EU SaaS, Cloud APIs with EU Data Residency

Advantages

Risks and Limitations

Suited For

Tier 2: European IaaS, GPU Hosting at European Providers

Specific Hardware Requirements and Costs

Advantages

Requirements

Suited For

Tier 3: On-Premises, AI on Your Own Hardware

Advantages

Challenges

Suited For

The Decision Tree

Hybrid as the Standard: The Routing Architecture

How the Routing Works

Prerequisite: Data Classification

Technical Implementation

Cost Effect

Summary: The Three Tiers at a Glance

AI Governance Briefing

Frequently Asked Questions

Which process should your first agent handle?