Build, Buy, Hybrid - EU AI Act-compliant
infrastructure before August 2026
AI infrastructure is growing faster than the governance structures that control it.
According to HashiCorp (2024), 82% of enterprises operate multi-cloud environments - but only 31% have a centralized governance strategy. The result: Shadow AI. Business units use external LLM APIs without approval. Data science teams deploy models on uncontrolled endpoints.
The Stanford HAI AI Index Report (2025) documents: Investment in AI infrastructure is growing 29% annually, but governance budgets grow only 8%. This gap creates technical debt.
| Layer | Responsibility | Who |
|---|---|---|
| Architecture Governance | Approved patterns, models, APIs | CTO + Enterprise Architecture |
| Operations Governance | SLAs, monitoring, incident response, cost mgmt. | Infrastructure + DevOps |
| Compliance Governance | EU AI Act, GDPR, audit trail, data residency | CTO + CISO + Legal |
| Cost Governance | Budgeting, chargeback, waste detection | FinOps + CTO |
| Security Governance | Zero Trust, encryption, access management | CISO + Platform |
Before the first AI agent goes to production:
According to Flexera (2024), enterprises waste an average of 28% of their cloud spending. For AI workloads with GPU instances, the waste rate is even higher.
Every AI infrastructure component requires a fundamental decision: build in-house, buy, or combine.
| Criterion | Build | Buy | Hybrid |
|---|---|---|---|
| Control | Full | Limited | Differentiated |
| Data Residency | Guaranteed | Contract-dependent | Controllable |
| Time-to-Value | 3-6 months | 1-4 weeks | 4-8 weeks |
| Operating costs | Fixed + personnel | Variable (pay-per-use) | Mixed |
| Vendor lock-in | None | High | Medium |
| Workload | Recommendation | Rationale |
|---|---|---|
| LLM inference (standard) | Buy | Cost-efficient at variable volume |
| LLM inference (sensitive) | Build | Data must not leave the EU |
| Agent Orchestration | Hybrid | Framework self-hosted, LLM calls routed |
| Document Intelligence | Build | Documents contain PII |
| Vector Database | Hybrid | Managed for non-sensitive, self-hosted for PII |
| Monitoring | Buy | Specialized tools with EU region |
GPU hardware: NVIDIA H100: USD 25,000-40,000 per card. Production cluster: 4-8 cards minimum.
Personnel: MLOps engineers, platform engineers, security specialists. 40% of enterprises lack the skills (Gartner 2024).
Maintenance: Model updates, security patches, infrastructure upgrades. Ongoing.
Data Residency: Where are prompts processed? Are they used for training?
Vendor lock-in: Proprietary APIs, embedding formats. Migration costs 3-6 months.
Availability: 12 hours downtime per quarter on average (Stanford HAI 2025).
For most enterprise scenarios, a hybrid approach is recommended: Model Gateway as central control layer (self-hosted), routing by sensitivity, fallback strategy for provider outages, cost optimization through intelligent model routing.
What lawyers read as compliance obligations are infrastructure requirements for the CTO.
Starting August 2026, six mandatory requirements apply (subject to Digital Omnibus Package - possible postponement to December 2027):
| Requirement | Art. | Infrastructure measure |
|---|---|---|
| Risk management | 9 | Confidence routing, circuit breaker, canary deployments |
| Data governance | 10 | Data lineage, immutable storage, data catalog |
| Record-keeping | 12 | Structured logging, retention 10y+, tamper-proof |
| Transparency | 13 | Observability stack, decision explanation API, model cards |
| Human oversight | 14 | HITL gateway (architectural), kill switch < 1s, auditor portal |
| Accuracy/robustness | 15 | Benchmark pipeline, adversarial testing, multi-region redundancy |
Confidence routing: Every agent output receives a confidence score. Below threshold: escalation. Circuit breaker: Automatic deactivation upon anomalies. Canary deployments: New model versions rolled out gradually, automatic rollback upon degradation.
Structured logging: Every API call, every agent decision, every HITL intervention. Retention: Lifetime of the system + 10 years (Art. 19). Tamper-proof: Append-only logs in immutable storage.
HITL gateway: Architecturally enforced human approval. No bypass. Kill switch: Immediate deactivation, latency < 1 second. Auditor portal: Read-only dashboard for compliance auditors.
Sanctions: Up to EUR 15 million or 3% of global annual turnover.
40% of security incidents in cloud environments are caused by misconfiguration - not by attacks (ENISA 2024).
| Pillar | Requirement | Implementation |
|---|---|---|
| Data Residency | All processing in EU data centers | Self-hosted models or EU region at provider |
| Encryption | At rest, in transit, in use | AES-256, TLS 1.3, mTLS, Confidential Computing |
| Zero Trust | No implicit trust | Identity-based access, least privilege, micro-segmentation |
| Supply Chain | Model and software provenance verified | Model provenance, SBOM, container scanning, signed artifacts |
| Component | EU requirement | Implementation |
|---|---|---|
| LLM inference | Prompts must not leave the EU | Self-hosted or EU region at provider |
| Vector Database | Embeddings contain encoded knowledge | EU region or self-hosted |
| Logging | Logs contain PII | EU storage with WORM policy |
| Backups | Same rules as production data | EU region, encrypted |
| Scenario | Risk | Measure |
|---|---|---|
| PII in prompts | Art. 6 - Legal basis | PII stripping before API call |
| Customer data in RAG | Art. 5 - Purpose limitation | Access control at document level |
| Logs with user data | Art. 17 - Right to erasure | Pseudonymization + retention policy |
| Embeddings with PII | Art. 22 - Automated decisions | Transparency documentation |
Agent frameworks are built for experiments, not for production. Enterprise agents need: defined permissions, audit trails, rollback, cost controlling.
| Component | Function | Technology |
|---|---|---|
| Orchestrator | Workflow, task routing, parallelization | Temporal, Prefect, Custom |
| Permission Layer | Agent permissions for tools/APIs | OPA, Cedar |
| State Management | Context, memory, task progress | Redis, PostgreSQL |
| Observability | Traces, token usage, latency | OpenTelemetry, Langfuse |
Result: MTTR for agent failures -70%. Uncontrolled API costs -40-60% (Gosign projects).
80% of enterprise data is unstructured (IDC 2024). A Document Intelligence pipeline classifies, extracts and vectorizes documents automatically.
| Stage | Function | Technology |
|---|---|---|
| Ingestion | Ingest PDF, Word, scans, email | Tika, Unstructured.io |
| OCR | Convert scans to text | Tesseract, PaddleOCR |
| Classification | Identify document type | Fine-tuned classifier |
| Extraction | Extract structured data | LLM + schema validation |
| Embedding | Vectorize documents | Sentence Transformers |
| Storage | Vectors + metadata | pgvector, Qdrant |
Result: 92-97% classification accuracy. Manual processing -60-80%.
Central layer between applications and LLM providers. Routing, PII detection, rate limiting, caching, fallback, logging.
| Request type | Routing | Rationale |
|---|---|---|
| Contains PII | Self-hosted model | Data stays in the EU |
| Standard classification | Cheapest model | Cost optimization |
| Complex analysis | Most capable model | Quality prioritized |
| Provider A down | Provider B | Availability |
Result: LLM costs -30-50% through routing and caching. Compliance through centralized PII screening.
AI systems fail silently. An LLM with poor responses does not throw an error.
| Layer | What is measured | Tools |
|---|---|---|
| Infrastructure | CPU, GPU, memory, network | Prometheus, Grafana |
| Application | Latency, error rate, throughput | OpenTelemetry, Jaeger |
| Model | Confidence, tokens, hallucination | Langfuse, WhyLabs |
| Business | Zero-touch rate, escalation | Custom dashboards |
| Cost | API costs per team/project | Infracost, Custom |
| Compliance | Audit completeness, HITL rate | Custom + SIEM |
Result: Quality issues detected 4x faster. Incident impact duration -65% (Gartner 2024).
10 questions for the CTO. Rate each with 0 (no), 1 (partially) or 2 (yes).
| # | Question | 0 | 1 | 2 |
|---|---|---|---|---|
| 1 | Complete inventory of all AI systems and APIs (including Shadow AI). | ☐ | ☐ | ☐ |
| 2 | Approved reference architecture for AI workloads with defined patterns. | ☐ | ☐ | ☐ |
| 3 | All AI data processing verifiably in EU data centers. | ☐ | ☐ | ☐ |
| 4 | Model Gateway with PII screening and centralized logging. | ☐ | ☐ | ☐ |
| 5 | Structured logging for every API call and every agent decision. | ☐ | ☐ | ☐ |
| 6 | Kill switch for individual agents and entire AI system (< 1s). | ☐ | ☐ | ☐ |
| 7 | GPU/API costs tracked per team, project and use case. | ☐ | ☐ | ☐ |
| 8 | Automated benchmark and adversarial evaluation before deployment. | ☐ | ☐ | ☐ |
| 9 | Backup and DR strategy specific to AI infrastructure. | ☐ | ☐ | ☐ |
| 10 | All 6 EU AI Act requirements (Art. 9-15) verifiably met. | ☐ | ☐ | ☐ |
| Score | Rating | Recommendation |
|---|---|---|
| 16-20 | Production-ready | Optimization and scaling. Ready for regulated workloads. |
| 10-15 | Foundation in place | Close gaps: logging, PII screening, kill switch. |
| 5-9 | Catching up needed | Reference architecture, Model Gateway, Shadow AI inventory. |
| 0-4 | Action required | Start immediately. Inventory + reference architecture. |
| Item | Actual | Recommendation |
|---|---|---|
| Models & compute | 70% | 35-40% |
| Infrastructure platform | 15% | 25-30% |
| Governance & compliance | 5% | 15-20% |
| Observability & monitoring | 5% | 10-15% |
| Security | 5% | 10-15% |
| Month | Focus | Outcome |
|---|---|---|
| 1 | Inventory & architecture | AI inventory, reference architecture, data residency verified, cost baseline |
| 2 | Gateway & governance | Model Gateway live, structured logging, kill switch, observability stack |
| 3 | Compliance & pilot | EU AI Act checklist, benchmark pipeline, adversarial testing, compliance audit |
| Layer | Recommendation | Alternatives |
|---|---|---|
| Model Gateway | LiteLLM, Portkey | Custom (Go/Python) |
| Agent Orchestration | Temporal + Custom | Prefect, Airflow |
| Vector Database | pgvector (PostgreSQL) | Qdrant, Weaviate |
| Observability | OpenTelemetry + Grafana | Datadog, Langfuse |
| Policy Engine | OPA | Cedar, Casbin |
| Secret Management | Vault | AWS KMS, SOPS |
| Container Runtime | Kubernetes | Nomad, ECS |
| CI/CD | GitHub Actions | GitLab CI, Tekton |
We analyze your AI infrastructure and identify the critical gaps.
Compliance, security and cost governance - 30 minutes, free of charge, no obligation.
Bert Gogolin - Managing Director, Gosign GmbH
Contact: www.gosign.de/en/contact
Web: www.gosign.de