Skip to content
Infrastructure & Technology

AI Hosting: EU SaaS, German Data Center, or Self-Hosted?

Three hosting strategies for enterprise AI. Decision matrix by data sensitivity, cost, and control.

Gosign 10 min read

“Where Does It Run?” — The Decisive Question

Before you select a model, before you build agents, before you roll out an interface, there is one question: where do your AI models run? This decision determines the data protection guarantees you can offer, the regulatory requirements you can meet, how high your ongoing costs are, and how dependent you become on third-party providers.

There are three fundamental strategies — and a fourth that has become the standard in practice: the hybrid architecture that combines all three.

Tier 1: EU SaaS — Cloud APIs with EU Data Residency

The simplest and fastest option: you use the model providers’ APIs directly. Claude via the Anthropic API (EU region), GPT-5.2 via Azure OpenAI (EU data center), Gemini via Google Cloud Platform (EU region). Data leaves your network but is processed in EU data centers.

Advantages

Fastest start: No infrastructure to build, no GPU servers to provision, no ML-Ops expertise required. Set up an API key, sign a data processing agreement, go live in hours.

Automatic updates: Model updates, security patches, and performance improvements are rolled out by the provider. No maintenance effort on your side.

Scalability: No capacity management. During load spikes, the cloud provider scales automatically. No over-provisioning, no under-capacity.

Model variety: Access to all model variants from the provider — flagship, balanced, and budget — through the same API.

Risks and Limitations

Data leaves the corporate network. Even with EU data residency, your requests are processed on infrastructure you do not control. The provider has technical access to data during processing.

CLOUD Act. US-based providers — including Anthropic, OpenAI, and Google — are subject to the US CLOUD Act. Under certain conditions, US authorities can request access to data even when it is stored in EU data centers. For most corporate data, this risk is assessable and acceptable. For trade secrets, classified data, or critical infrastructure information, it is not.

Vendor dependency. With a single-provider strategy, you are dependent on one provider’s pricing, API changes, and availability. A model-agnostic architecture (see AI Models Comparison 2026) reduces this risk.

DPA required. GDPR-compliant use requires a data processing agreement (DPA) with the provider. All three major providers offer standard DPAs — review them with your legal department.

Suited For

  • Standard tasks with non-sensitive data: summaries, translations, general question answering
  • Proof of concepts and pilot projects
  • Tasks with variable volume where dedicated GPU infrastructure would be uneconomical
  • Organizations without ML-Ops expertise that want to go live quickly

Tier 2: European IaaS — GPU Hosting at European Providers

The middle ground: you rent GPU servers from a European Infrastructure-as-a-Service provider — such as Hetzner, IONOS, or a specialized GPU cloud provider. On these servers, you operate open-source models like gpt-oss, Llama 4, or Mistral Medium 3.1 yourself.

Specific Hardware Requirements and Costs

ModelGPU RequirementEstimated Cost/Month
gpt-oss-120b1x A100/H100 (80 GB)approx. EUR 1,200
gpt-oss-20bCPU/16 GB RAM (or small GPU)approx. EUR 200—400
Llama 4 Scout1x A100 (80 GB)approx. EUR 1,200
Llama 4 Maverick4x A100 (80 GB)approx. EUR 3,500
Mistral Medium 3.14x A100 (80 GB)approx. EUR 3,500

Advantages

Data stays in Europe. The server sits in a European data center, operated by a European provider. No CLOUD Act, no transatlantic data transfer. For GDPR compliance, this is the most secure cloud option.

No vendor lock-in. You operate open-source models under Apache 2.0 or the Meta Llama License. If you want to switch hosting providers, you migrate the model — no license questions, no contract negotiations.

Full model control. You decide which model in which version runs. You can fine-tune, quantize, or replace models with newer versions — without waiting for the provider.

Predictable costs. GPU servers have fixed monthly costs. No variable token charges, no surprises during load spikes. For organizations with high, constant volume, this is often more economical than cloud APIs.

Requirements

ML-Ops competency. You need someone to deploy, monitor, update, and troubleshoot the model. This can be an internal ML engineer or an external service provider — but it is not zero effort.

Capacity planning. A GPU server has a defined capacity. If you have 500 concurrent requests, a single GPU will not suffice. You must understand load profiles and plan capacity accordingly.

No automatic updates. When a new model is released, you deploy it yourself. When a security issue arises, you patch it yourself.

Suited For

  • Confidential corporate data (sensitivity level 2—3)
  • Organizations that must eliminate CLOUD Act risks
  • Use cases with constant, high volume (cost advantage over cloud APIs)
  • Organizations with existing DevOps/ML-Ops competency

Tier 3: On-Premises — AI on Your Own Hardware

The maximum-control option: you operate GPU servers in your own data center or in a colocation rack. No data leaves your network — under any circumstances.

Advantages

Maximum data sovereignty. No external access, no external provider, no external dependency. The hardware is yours, the model is yours, the data never leaves your network.

Regulatory certainty. For critical infrastructure operators, government agencies, defense, and organizations with classified data, on-premises is often the only option that meets compliance requirements.

No recurring license or API costs. After the initial investment, only electricity, cooling, and maintenance remain. Over long-term operation at high volume, on-premises can be the most economical option.

Challenges

High initial investment. A production GPU server with an NVIDIA H100 (80 GB) costs EUR 25,000—40,000. For more capable setups (multi-GPU, redundancy), costs range from EUR 60,000 to EUR 120,000 or more.

ML-Ops team required. On-premises means you are responsible for everything: hardware maintenance, model deployment, monitoring, updates, security. This requires a dedicated team or an experienced service provider.

Scaling is not trivial. When demand increases, you cannot add another GPU at the push of a button. Hardware procurement takes weeks to months.

Suited For

  • Critical infrastructure operators and government agencies
  • Classified data and highest confidentiality levels
  • Organizations with their own data center and ML-Ops competency
  • Long-term investment willingness at very high volume

The Decision Tree

The following decision logic helps with the assignment:

Does your data contain PII or trade secrets?
├── NO → EU SaaS (Tier 1)
└── YES → Critical infrastructure or classified data?
    ├── YES → On-Premises (Tier 3)
    └── NO → European IaaS (Tier 2) or Hybrid

In practice, the answer is rarely a single tier. Most organizations have data of varying sensitivity — and therefore need an architecture that covers all tiers.

Hybrid as the Standard: The Routing Architecture

The hybrid strategy combines all three tiers in a single architecture. A routing layer automatically decides which request goes through which channel — based on data sensitivity, not on individual employee decisions.

How the Routing Works

Data sensitivity level 1—2 (public, internal): Requests go through cloud APIs. Fast, affordable, scalable. Example: summarizing a public whitepaper, translating a press release, drafting a general email.

Data sensitivity level 3 (confidential): Requests are routed to self-hosted models in the European data center. No data egress, no CLOUD Act. Example: analyzing internal contracts, processing HR data, evaluating confidential financial data.

Data sensitivity level 4 (strictly confidential / regulated): Requests run exclusively on on-premises infrastructure. Example: classified documents, critical infrastructure systems, data under special confidentiality obligations.

Prerequisite: Data Classification

For routing to work, the organization must classify its data. This sounds onerous but is already in place at many organizations — for example, as part of existing Information Security Management Systems (ISMS) or national security classification frameworks. The routing rules map this existing classification to the AI infrastructure.

Technical Implementation

The routing layer sits between the Enterprise AI Portal (the interface employees use) and the model endpoints. It consists of three components:

  1. Classifier: Automatically detects the data sensitivity of a request — based on keywords, source system, or explicit user marking.
  2. Routing Engine: Assigns the request to the appropriate model endpoint — cloud API, European IaaS, or on-premises.
  3. Audit Log: Records every routing decision — which request, which sensitivity level, which endpoint. Traceable and exportable.

Cost Effect

The hybrid architecture optimizes not only data security but also costs. Cloud APIs are inexpensive per request but variable. Self-hosted models have fixed costs that amortize at high volume. The combination leverages both: affordable cloud APIs for the bulk of non-sensitive requests, fixed-cost self-hosted models for confidential volume.

In practice, we see the following distribution at organizations with 1,000+ employees: 60—70% of requests go through cloud APIs (tier 1—2), 25—35% through European IaaS (tier 3), and 5—10% through on-premises (tier 4). Total costs are 30—40% below a pure cloud-API strategy while delivering higher data sovereignty.

Summary: The Three Tiers at a Glance

CriterionEU SaaS (Tier 1)European IaaS (Tier 2)On-Premises (Tier 3)
Data SovereigntyEU region, DPAEurope, no CLOUD ActMaximum
Initial CostNoneLow (rental)High (EUR 60—120K+)
Ongoing CostVariable (token)Fixed (GPU rental)Fixed (power, maintenance)
ML-Ops EffortNoneMediumHigh
ScalabilityAutomaticManualManual, slow
Suited ForLevel 1—2 dataLevel 2—3 dataLevel 3—4 data

The right strategy is almost always a combination. Gosign implements the routing layer that connects all three tiers — so your employees use a single interface and the system automatically selects the right path.

Further reading: AI Infrastructure | Decision Layer & Shadow AI


📘 Enterprise AI Infrastructure Blueprint 2026 – Article Series

← PreviousOverviewNext →
AI Models 2026: Which Model for Which Use Case?OverviewEnterprise AI Portal: Four Open-Source Interfaces Compared

All articles in this series: Enterprise AI Infrastructure Blueprint 2026


Want to know which hosting strategy is right for your data landscape? Gosign analyzes your data classification and designs the appropriate hybrid architecture.

Book a consultation — We clarify in 30 minutes which hosting tiers you need.

AI Hosting Self-Hosted GDPR Cloud Act GPU Server Enterprise AI
Share this article

Frequently Asked Questions

Is using cloud AI APIs GDPR-compliant?

Yes, if the provider guarantees EU data residency and a data processing agreement is in place. Claude (Anthropic), GPT (Azure OpenAI) and Gemini (Google) offer EU regions. For sensitivity level 3–4 data, we recommend self-hosting.

What does self-hosting AI models cost?

gpt-oss-120b runs on a single GPU (80 GB) – approximately €1,200/month at a German hosting provider. Larger models like Llama 4 Maverick require 4+ GPUs, approximately €3,500/month.

What is the hybrid strategy?

The hybrid architecture automatically routes requests by data sensitivity: public data via cloud APIs (fast, affordable), confidential data via self-hosted models (no data egress). A routing layer decides automatically.

Which process should your first agent handle?

Talk to us about a concrete use case.

Schedule a call