AI Infrastructure

The production platform for AI agents - model-agnostic, in your infrastructure. Models, hosting, pipeline, stack.

Discuss architecture

EU AI Act ready

Every decision traceable

Your infrastructure

No SaaS. No vendor lock-in.

EU-first

No US cloud risk

Why Infrastructure Is the Bottleneck

Most organisations piloting AI agents do not fail because of the model. The models work. They fail because of infrastructure: no governance framework, no audit trail, no tenant isolation, no deployment concept, no integration into existing systems.

A pilot on a laptop is not a production architecture. This page describes the concrete technologies and configurations that turn an LLM experiment into an operational system.

How the individual infrastructure components work together architecturally is described in the 7-Layer Reference Architecture.

Free eBook: AI Infrastructure

Build, Buy, Hybrid - EU AI Act-compliant infrastructure with B/B/H Framework and 7-Layer Reference Architecture.

Download for free

Four Infrastructure Components

1. LLM Hosting

The model layer. Where language understanding happens.

Cloud LLMs:

Azure OpenAI (ChatGPT, Claude) - EU regions, Microsoft DPA
Amazon Bedrock (Claude, Llama, Mistral) - EU regions, AWS DPA
Google Vertex AI (Gemini) - EU regions, Google DPA
Anthropic API (Claude) - with EU data processing

Self-Hosted LLMs:

Llama (Meta) - open source, on your own hardware
Mistral - open source, EU-based company
DeepSeek - open source, cost-efficient
gpt-oss (OpenAI) - open weight, Apache 2.0, fully self-hostable. 120B on a single H100, 20B on 16 GB consumer hardware.

Inference Frameworks for Self-Hosted:

Ollama - Easy entry, local development, edge deployment
vLLM - Production-grade, GPU-optimised, high throughput

Hybrid:

Self-hosted for sensitive data (HR, finance)
Cloud LLMs for less critical workloads (document classification)
Automatic routing based on data classification

Model choice is a trade-off between performance, cost, data protection, and latency. We advise on selection and implement model-agnostically - switching models does not change the business logic. Further reading: LLM Models Comparison 2026, LLM Self-Hosting for Enterprise

Our AI engineers are Microsoft-certified for Azure AI Services. Deployment options include Microsoft Azure, GCP, and fully self-hosted infrastructure - the architecture decision stays with the client, not the vendor.

2. RAG Pipeline

Retrieval Augmented Generation - how agents access enterprise knowledge.

RAG Pipeline: Documents are split into chunks, stored as embeddings in a vector store, semantically retrieved on query, and passed to the LLM as context

Quality characteristics:

Semantic chunking (by content, not by page number)
Metadata enrichment (document type, version, scope of validity)
Hybrid search (vector search + keyword search for precision)
Source citation in every response (document, page, paragraph)
Regular re-indexing on document changes

3. Orchestration

The control layer. How agents, systems, and people work together.

Trigger.dev or Camunda: Open-source workflow engine. Visual workflows, API integration, webhooks. Self-hosted, no vendor lock-in.
API Gateway: Unified entry point. Rate limiting, authentication, logging, monitoring.
Queue system: Asynchronous processing for batch operations (month-end close, bulk imports).
Event system: Real-time reaction to incoming documents, status changes, escalations.

Orchestration is the difference between "an agent can do something" and "an agent reliably does something in production". More: Agent Orchestration Platforms

4. Deployment

Where the infrastructure runs. All options EU-only available.

Azure (EU)

Azure Kubernetes Service (AKS) for container orchestration
Azure SQL / PostgreSQL for data and Audit Trail
Azure OpenAI for LLM hosting
Regions: West Europe, North Europe, Germany West Central

AWS (EU)

Amazon EKS for container orchestration
Amazon RDS / Aurora PostgreSQL for data and Audit Trail
Amazon Bedrock for LLM hosting (Claude, Llama, Mistral)
Regions: eu-central-1 (Frankfurt), eu-west-1 (Ireland), eu-west-3 (Paris)

GCP (EU)

Google Kubernetes Engine (GKE) for container orchestration
Cloud SQL / AlloyDB for data and Audit Trail
Vertex AI for LLM hosting
Regions: europe-west1, europe-west3, europe-west4

Vercel EU + Supabase EU

Vercel for frontend and edge functions in EU data centres
Supabase for database (PostgreSQL), auth, and storage
Lightweight EU deployment option without own Kubernetes infrastructure
Managed services with EU data residency

Self-Hosted

Docker / Kubernetes on your own hardware
PostgreSQL with pgvector for data and vector search
Open-source LLMs on your own GPUs
Complete Cloud Act independence

Hybrid

Combination by data classification
Sensitive workloads self-hosted, standard workloads cloud
Unified orchestration across all environments

Technology Stack

Component	Technology	Why
Workflow engine	Trigger.dev, Camunda	Open source, self-hosted, no vendor lock-in
Database	PostgreSQL + pgvector	Enterprise-ready, RLS-capable, vector search integrated
Backend	Python, TypeScript	Proven for ML workloads and API development
Frontend	React / Next.js	For dashboard, chat UI, Auditor Portal
Containers	Docker, Kubernetes	Standard for cloud and self-hosted
API	REST, GraphQL	Integration with existing systems
Auth	Supabase Auth / OIDC	SSO-capable, integrates with enterprise identity providers
Monitoring	Prometheus, Grafana	Open source, self-hosted, real-time dashboards
Inference	Ollama, vLLM	Self-hosted LLM inference, GPU-optimised

Governance Is Built In

The infrastructure includes Governance by Design:

Audit Trail at infrastructure level (not just application level)
Row-Level Security at database level - details in Data Residency
Encryption at rest and in transit
RBAC across all components
Cert-Ready Controls as technical data objects

Governance in the 7-Layer Architecture ->

Full Source Code Access

The infrastructure runs on your systems - Azure, GCP, AWS or self-hosted. No SaaS, no hosting at Gosign. Full access to source code, configurations, and rule sets. Open-source stack where possible. Proprietary components only for the LLMs themselves - and there model-agnostic.

After 12-18 months, you operate the infrastructure independently.

Deep Dive in the Agent Briefing

Our article series for decision-makers implementing AI agents in the enterprise.

Infrastructure

Enterprise AI Infrastructure Blueprint 2026

Infrastructure

AI Hosting: EU SaaS, German Data Center, or Self-Hosted?

Infrastructure

LLM Self-Hosting for Enterprise - Azure, GCP, On-Premise

Deep Dives

Architecture

7-Layer Reference Architecture

How the infrastructure components work together architecturally - Presentation, Orchestration, Agent, Decision Layer, Model, Integration, Infrastructure.

View Reference Architecture >

Knowledge Resource

Blueprint 2026

Eleven articles on the infrastructure decisions that matter in 2026: AI models, hosting, RAG, orchestration, costs, EU AI Act.

Read the Series >

Governance

Data Residency

Row-Level Security, tenant isolation, encryption, EU data processing - where your data lives and who controls it.

Data Residency >

Agents

AI Agents

Document Agents, Workflow Agents, Knowledge Agents - three agent types for enterprise processes.

Explore AI Agents >

Frequently Asked Questions about AI Infrastructure

Do we have to choose between cloud and self-hosted?

No. The architecture supports hybrid deployment. You can process sensitive data self-hosted and use cloud services for less critical workloads. The layers above the infrastructure remain identical.

Which cloud providers are supported?

Azure (EU), AWS (EU), GCP (EU), Vercel EU + Supabase EU, Self-Hosted, or Hybrid. The architecture is cloud-agnostic - switching providers only changes the Infrastructure Layer, not the business logic.

Which LLMs are supported?

ChatGPT, Claude, Gemini, Llama, Mistral, DeepSeek, gpt-oss and more. Open source or commercial models. Self-hosted via Ollama or vLLM - including OpenAI's own open-weight models running entirely in your infrastructure.

Do we need GPU hardware for self-hosted models?

For open-source models like Llama, Mistral, or gpt-oss, GPU hardware is required. gpt-oss-120B runs on a single H100, gpt-oss-20B on 16 GB consumer hardware. Sizing depends on the model and usage load. We advise on hardware selection.

How does this page relate to the Reference Architecture?

The Reference Architecture describes the architectural pattern - which layers exist and why. This page describes the concrete implementation - which technologies, which cloud regions, which hardware. Architecture is the what, infrastructure is the how.

Which infrastructure fits your requirements?

Azure EU, AWS EU, GCP EU, Vercel EU + Supabase EU, Self-Hosted, or Hybrid. We configure to your requirements.

Schedule a conversation