AI Infrastructure
The production platform for AI agents - model-agnostic, in your infrastructure. Models, hosting, pipeline, stack.
Why Infrastructure Is the Bottleneck
Most organisations piloting AI agents do not fail because of the model. The models work. They fail because of infrastructure: no governance framework, no audit trail, no tenant isolation, no deployment concept, no integration into existing systems.
A pilot on a laptop is not a production architecture. This page describes the concrete technologies and configurations that turn an LLM experiment into an operational system.
How the individual infrastructure components work together architecturally is described in the 7-Layer Reference Architecture.
Four Infrastructure Components
1. LLM Hosting
The model layer. Where language understanding happens.
Cloud LLMs:
- Azure OpenAI (ChatGPT, Claude) - EU regions, Microsoft DPA
- Amazon Bedrock (Claude, Llama, Mistral) - EU regions, AWS DPA
- Google Vertex AI (Gemini) - EU regions, Google DPA
- Anthropic API (Claude) - with EU data processing
Self-Hosted LLMs:
- Llama (Meta) - open source, on your own hardware
- Mistral - open source, EU-based company
- DeepSeek - open source, cost-efficient
- gpt-oss (OpenAI) - open weight, Apache 2.0, fully self-hostable. 120B on a single H100, 20B on 16 GB consumer hardware.
Inference Frameworks for Self-Hosted:
- Ollama - Easy entry, local development, edge deployment
- vLLM - Production-grade, GPU-optimised, high throughput
Hybrid:
- Self-hosted for sensitive data (HR, finance)
- Cloud LLMs for less critical workloads (document classification)
- Automatic routing based on data classification
Model choice is a trade-off between performance, cost, data protection, and latency. We advise on selection and implement model-agnostically - switching models does not change the business logic. Further reading: LLM Models Comparison 2026, LLM Self-Hosting for Enterprise
Our AI engineers are Microsoft-certified for Azure AI Services. Deployment options include Microsoft Azure, GCP, and fully self-hosted infrastructure - the architecture decision stays with the client, not the vendor.
2. RAG Pipeline
Retrieval Augmented Generation - how agents access enterprise knowledge.
Quality characteristics:
- Semantic chunking (by content, not by page number)
- Metadata enrichment (document type, version, scope of validity)
- Hybrid search (vector search + keyword search for precision)
- Source citation in every response (document, page, paragraph)
- Regular re-indexing on document changes
3. Orchestration
The control layer. How agents, systems, and people work together.
- Trigger.dev or Camunda: Open-source workflow engine. Visual workflows, API integration, webhooks. Self-hosted, no vendor lock-in.
- API Gateway: Unified entry point. Rate limiting, authentication, logging, monitoring.
- Queue system: Asynchronous processing for batch operations (month-end close, bulk imports).
- Event system: Real-time reaction to incoming documents, status changes, escalations.
Orchestration is the difference between "an agent can do something" and "an agent reliably does something in production". More: Agent Orchestration Platforms
4. Deployment
Where the infrastructure runs. All options EU-only available.
Azure (EU)
- Azure Kubernetes Service (AKS) for container orchestration
- Azure SQL / PostgreSQL for data and Audit Trail
- Azure OpenAI for LLM hosting
- Regions: West Europe, North Europe, Germany West Central
AWS (EU)
- Amazon EKS for container orchestration
- Amazon RDS / Aurora PostgreSQL for data and Audit Trail
- Amazon Bedrock for LLM hosting (Claude, Llama, Mistral)
- Regions: eu-central-1 (Frankfurt), eu-west-1 (Ireland), eu-west-3 (Paris)
GCP (EU)
- Google Kubernetes Engine (GKE) for container orchestration
- Cloud SQL / AlloyDB for data and Audit Trail
- Vertex AI for LLM hosting
- Regions: europe-west1, europe-west3, europe-west4
Vercel EU + Supabase EU
- Vercel for frontend and edge functions in EU data centres
- Supabase for database (PostgreSQL), auth, and storage
- Lightweight EU deployment option without own Kubernetes infrastructure
- Managed services with EU data residency
Self-Hosted
- Docker / Kubernetes on your own hardware
- PostgreSQL with pgvector for data and vector search
- Open-source LLMs on your own GPUs
- Complete Cloud Act independence
Hybrid
- Combination by data classification
- Sensitive workloads self-hosted, standard workloads cloud
- Unified orchestration across all environments
Technology Stack
| Component | Technology | Why |
|---|---|---|
| Workflow engine | Trigger.dev, Camunda | Open source, self-hosted, no vendor lock-in |
| Database | PostgreSQL + pgvector | Enterprise-ready, RLS-capable, vector search integrated |
| Backend | Python, TypeScript | Proven for ML workloads and API development |
| Frontend | React / Next.js | For dashboard, chat UI, Auditor Portal |
| Containers | Docker, Kubernetes | Standard for cloud and self-hosted |
| API | REST, GraphQL | Integration with existing systems |
| Auth | Supabase Auth / OIDC | SSO-capable, integrates with enterprise identity providers |
| Monitoring | Prometheus, Grafana | Open source, self-hosted, real-time dashboards |
| Inference | Ollama, vLLM | Self-hosted LLM inference, GPU-optimised |
Governance Is Built In
The infrastructure includes Governance by Design:
- Audit Trail at infrastructure level (not just application level)
- Row-Level Security at database level - details in Data Residency
- Encryption at rest and in transit
- RBAC across all components
- Cert-Ready Controls as technical data objects
Full Source Code Access
The infrastructure runs on your systems - Azure, GCP, AWS or self-hosted. No SaaS, no hosting at Gosign. Full access to source code, configurations, and rule sets. Open-source stack where possible. Proprietary components only for the LLMs themselves - and there model-agnostic.
After 12-18 months, you operate the infrastructure independently.
Deep Dive in the Agent Briefing
Our article series for decision-makers implementing AI agents in the enterprise.
Frequently Asked Questions about AI Infrastructure
Do we have to choose between cloud and self-hosted?
No. The architecture supports hybrid deployment. You can process sensitive data self-hosted and use cloud services for less critical workloads. The layers above the infrastructure remain identical.
Which cloud providers are supported?
Azure (EU), AWS (EU), GCP (EU), Vercel EU + Supabase EU, Self-Hosted, or Hybrid. The architecture is cloud-agnostic - switching providers only changes the Infrastructure Layer, not the business logic.
Which LLMs are supported?
ChatGPT, Claude, Gemini, Llama, Mistral, DeepSeek, gpt-oss and more. Open source or commercial models. Self-hosted via Ollama or vLLM - including OpenAI's own open-weight models running entirely in your infrastructure.
Do we need GPU hardware for self-hosted models?
For open-source models like Llama, Mistral, or gpt-oss, GPU hardware is required. gpt-oss-120B runs on a single H100, gpt-oss-20B on 16 GB consumer hardware. Sizing depends on the model and usage load. We advise on hardware selection.
How does this page relate to the Reference Architecture?
The Reference Architecture describes the architectural pattern - which layers exist and why. This page describes the concrete implementation - which technologies, which cloud regions, which hardware. Architecture is the what, infrastructure is the how.
Deep Dives
Architecture
7-Layer Reference Architecture
How the infrastructure components work together architecturally - Presentation, Orchestration, Agent, Decision Layer, Model, Integration, Infrastructure.
View Reference Architecture >Knowledge Resource
Blueprint 2026
Eleven articles on the infrastructure decisions that matter in 2026: AI models, hosting, RAG, orchestration, costs, EU AI Act.
Read the Series >Governance
Data Residency
Row-Level Security, tenant isolation, encryption, EU data processing - where your data lives and who controls it.
Data Residency >Agents
AI Agents
Document Agents, Workflow Agents, Knowledge Agents - three agent types for enterprise processes.
Explore AI Agents >Which infrastructure fits your requirements?
Azure EU, AWS EU, GCP EU, Vercel EU + Supabase EU, Self-Hosted, or Hybrid. We configure to your requirements.
Schedule a conversation