Skip to content

AI Infrastructure

The production platform for AI agents - model-agnostic, in your infrastructure. Models, hosting, pipeline, stack.

AirbusVolkswagenShellSonyEvonikPhilipsKPMG

Why Infrastructure Is the Bottleneck

Most organisations piloting AI agents do not fail because of the model. The models work. They fail because of infrastructure: no governance framework, no audit trail, no tenant isolation, no deployment concept, no integration into existing systems.

A pilot on a laptop is not a production architecture. This page describes the concrete technologies and configurations that turn an LLM experiment into an operational system.

How the individual infrastructure components work together architecturally is described in the 7-Layer Reference Architecture.

Four Infrastructure Components

1. LLM Hosting

The model layer. Where language understanding happens.

Cloud LLMs:

  • Azure OpenAI (ChatGPT, Claude) - EU regions, Microsoft DPA
  • Amazon Bedrock (Claude, Llama, Mistral) - EU regions, AWS DPA
  • Google Vertex AI (Gemini) - EU regions, Google DPA
  • Anthropic API (Claude) - with EU data processing

Self-Hosted LLMs:

  • Llama (Meta) - open source, on your own hardware
  • Mistral - open source, EU-based company
  • DeepSeek - open source, cost-efficient
  • gpt-oss (OpenAI) - open weight, Apache 2.0, fully self-hostable. 120B on a single H100, 20B on 16 GB consumer hardware.

Inference Frameworks for Self-Hosted:

  • Ollama - Easy entry, local development, edge deployment
  • vLLM - Production-grade, GPU-optimised, high throughput

Hybrid:

  • Self-hosted for sensitive data (HR, finance)
  • Cloud LLMs for less critical workloads (document classification)
  • Automatic routing based on data classification

Model choice is a trade-off between performance, cost, data protection, and latency. We advise on selection and implement model-agnostically - switching models does not change the business logic. Further reading: LLM Models Comparison 2026, LLM Self-Hosting for Enterprise

Our AI engineers are Microsoft-certified for Azure AI Services. Deployment options include Microsoft Azure, GCP, and fully self-hosted infrastructure - the architecture decision stays with the client, not the vendor.

2. RAG Pipeline

Retrieval Augmented Generation - how agents access enterprise knowledge.

RAG Pipeline: Documents are split into chunks, stored as embeddings in a vector store, semantically retrieved on query, and passed to the LLM as context

Quality characteristics:

  • Semantic chunking (by content, not by page number)
  • Metadata enrichment (document type, version, scope of validity)
  • Hybrid search (vector search + keyword search for precision)
  • Source citation in every response (document, page, paragraph)
  • Regular re-indexing on document changes

3. Orchestration

The control layer. How agents, systems, and people work together.

  • Trigger.dev or Camunda: Open-source workflow engine. Visual workflows, API integration, webhooks. Self-hosted, no vendor lock-in.
  • API Gateway: Unified entry point. Rate limiting, authentication, logging, monitoring.
  • Queue system: Asynchronous processing for batch operations (month-end close, bulk imports).
  • Event system: Real-time reaction to incoming documents, status changes, escalations.

Orchestration is the difference between "an agent can do something" and "an agent reliably does something in production". More: Agent Orchestration Platforms

4. Deployment

Where the infrastructure runs. All options EU-only available.

Azure (EU)

  • Azure Kubernetes Service (AKS) for container orchestration
  • Azure SQL / PostgreSQL for data and Audit Trail
  • Azure OpenAI for LLM hosting
  • Regions: West Europe, North Europe, Germany West Central

AWS (EU)

  • Amazon EKS for container orchestration
  • Amazon RDS / Aurora PostgreSQL for data and Audit Trail
  • Amazon Bedrock for LLM hosting (Claude, Llama, Mistral)
  • Regions: eu-central-1 (Frankfurt), eu-west-1 (Ireland), eu-west-3 (Paris)

GCP (EU)

  • Google Kubernetes Engine (GKE) for container orchestration
  • Cloud SQL / AlloyDB for data and Audit Trail
  • Vertex AI for LLM hosting
  • Regions: europe-west1, europe-west3, europe-west4

Vercel EU + Supabase EU

  • Vercel for frontend and edge functions in EU data centres
  • Supabase for database (PostgreSQL), auth, and storage
  • Lightweight EU deployment option without own Kubernetes infrastructure
  • Managed services with EU data residency

Self-Hosted

Hybrid

  • Combination by data classification
  • Sensitive workloads self-hosted, standard workloads cloud
  • Unified orchestration across all environments

Technology Stack

Component Technology Why
Workflow engineTrigger.dev, CamundaOpen source, self-hosted, no vendor lock-in
DatabasePostgreSQL + pgvectorEnterprise-ready, RLS-capable, vector search integrated
BackendPython, TypeScriptProven for ML workloads and API development
FrontendReact / Next.jsFor dashboard, chat UI, Auditor Portal
ContainersDocker, KubernetesStandard for cloud and self-hosted
APIREST, GraphQLIntegration with existing systems
AuthSupabase Auth / OIDCSSO-capable, integrates with enterprise identity providers
MonitoringPrometheus, GrafanaOpen source, self-hosted, real-time dashboards
InferenceOllama, vLLMSelf-hosted LLM inference, GPU-optimised

Governance Is Built In

The infrastructure includes Governance by Design:

  • Audit Trail at infrastructure level (not just application level)
  • Row-Level Security at database level - details in Data Residency
  • Encryption at rest and in transit
  • RBAC across all components
  • Cert-Ready Controls as technical data objects

Governance in the 7-Layer Architecture →

Full Source Code Access

The infrastructure runs on your systems - Azure, GCP, AWS or self-hosted. No SaaS, no hosting at Gosign. Full access to source code, configurations, and rule sets. Open-source stack where possible. Proprietary components only for the LLMs themselves - and there model-agnostic.

After 12-18 months, you operate the infrastructure independently.

Frequently Asked Questions about AI Infrastructure

Do we have to choose between cloud and self-hosted?

No. The architecture supports hybrid deployment. You can process sensitive data self-hosted and use cloud services for less critical workloads. The layers above the infrastructure remain identical.

Which cloud providers are supported?

Azure (EU), AWS (EU), GCP (EU), Vercel EU + Supabase EU, Self-Hosted, or Hybrid. The architecture is cloud-agnostic - switching providers only changes the Infrastructure Layer, not the business logic.

Which LLMs are supported?

ChatGPT, Claude, Gemini, Llama, Mistral, DeepSeek, gpt-oss and more. Open source or commercial models. Self-hosted via Ollama or vLLM - including OpenAI's own open-weight models running entirely in your infrastructure.

Do we need GPU hardware for self-hosted models?

For open-source models like Llama, Mistral, or gpt-oss, GPU hardware is required. gpt-oss-120B runs on a single H100, gpt-oss-20B on 16 GB consumer hardware. Sizing depends on the model and usage load. We advise on hardware selection.

How does this page relate to the Reference Architecture?

The Reference Architecture describes the architectural pattern - which layers exist and why. This page describes the concrete implementation - which technologies, which cloud regions, which hardware. Architecture is the what, infrastructure is the how.

Deep Dives

Architecture

7-Layer Reference Architecture

How the infrastructure components work together architecturally - Presentation, Orchestration, Agent, Decision Layer, Model, Integration, Infrastructure.

View Reference Architecture >

Knowledge Resource

Blueprint 2026

Eleven articles on the infrastructure decisions that matter in 2026: AI models, hosting, RAG, orchestration, costs, EU AI Act.

Read the Series >

Governance

Data Residency

Row-Level Security, tenant isolation, encryption, EU data processing - where your data lives and who controls it.

Data Residency >

Agents

AI Agents

Document Agents, Workflow Agents, Knowledge Agents - three agent types for enterprise processes.

Explore AI Agents >

Which infrastructure fits your requirements?

Azure EU, AWS EU, GCP EU, Vercel EU + Supabase EU, Self-Hosted, or Hybrid. We configure to your requirements.

Schedule a conversation