Latest

Envoy AI Gateway v0.4.x

Release introducing Model Context Protocol (MCP) Gateway, OpenAI Image Generation, Anthropic support (direct and AWS Bedrock), guided output decoding for GCP Vertex AI/Gemini, cross-namespace references, enhanced authentication, and comprehensive observability improvements.

v0.4.0

November 07, 2025

MCP GatewayImage GenerationAnthropic SupportGuided Output DecodingCross-Namespace SupportEnhanced Upstream AuthInferencePool v1

Envoy AI Gateway v0.4.0 is a major release introducing Model Context Protocol (MCP) Gateway with full OAuth and server multiplexing support, OpenAI Image Generation API, first-party Anthropic provider integration, and native Anthropic Messages API support for AWS Bedrock. This release also brings guided output decoding for GCP Vertex AI/Gemini, cross-namespace resource references, enhanced authentication for AWS and Azure, and comprehensive observability improvements across all providers.

✨ New Features

Model Context Protocol (MCP) Gateway

New MCPRoute CRD

Introduces MCPRoute custom resource for routing MCP requests to backend MCP servers, enabling unified AI API for multiple MCP backends.

Complete MCP spec implementation

Includes streamable HTTP transport, JSON-RPC 2.0 support, and MCP spec-compliant OAuth 2.0 authorization with JWKS validation and Protected Resource Metadata.

Server multiplexing and tool routing

Aggregates multiple MCP servers behind a single endpoint with intelligent tool routing, tool filtering (exact match and regex patterns), and collision detection.

Upstream authentication

Supports both OAuth-based authentication and API key authentication for secure backend MCP server communication with configurable headers.

Session management

Implements MCP session handling with encryption, rotatable seeds, and graceful session lifecycle management.

Anthropic Provider Support

Direct api.anthropic.com support

Native integration with Anthropic's API at api.anthropic.com, complementing existing GCP Vertex AI Anthropic support.

AWS Bedrock native Anthropic Messages API

Support for Claude models on AWS Bedrock using the native Anthropic Messages API format instead of the generic Converse API, enabling full feature parity with direct Anthropic API including prompt caching and extended thinking.

Anthropic API key authentication

Native x-api-key header-based authentication matching Anthropic's API conventions and SDK patterns for direct Anthropic connections.

Passthrough translator with token usage tracking

Efficient passthrough translation layer that captures token usage and maintains API compatibility while minimizing overhead for both direct and AWS Bedrock Anthropic endpoints.

Standalone CLI auto-configuration

Auto-configuration from ANTHROPIC_API_KEY environment variable in standalone mode for zero-config deployments.

Guided Output Support for GCP Vertex AI/Gemini

Guided regex support

Constrains model outputs to match specific regular expressions for GCP Vertex AI/Gemini models, enabling structured text generation.

Guided choice support

Restricts model outputs to predefined choices for GCP Vertex AI/Gemini models, ensuring responses conform to expected values.

Guided JSON support

Ensures model outputs are valid JSON conforming to specified schemas for GCP Vertex AI/Gemini models, with OpenAI-compatible API translation.

Provider-Specific Enhancements

OpenAI Image Generation /v1/images/generations endpoint

End-to-end support for OpenAI's image generation API including request/response translation, Brotli encoding/decoding, and full protocol compatibility.

OpenAI legacy /v1/completions endpoint

Full pass-through support for OpenAI's legacy completions endpoint with complete tracing and metrics, ensuring backward compatibility.

Azure OpenAI embeddings support

Native support for Azure OpenAI embeddings API with proper protocol translation and token usage tracking.

AWS Bedrock reasoning tokens

Full support for reasoning/thinking tokens in AWS Bedrock responses for both streaming and non-streaming modes, properly exposing extended thinking processes in Claude models.

GCP Vertex AI safety settings

Support for GCP-specific safety settings configuration, allowing fine-grained control over content filtering and safety thresholds for Gemini models.

GCP Gemini streaming token accounting

Accurate completion_tokens reporting in streaming usage chunks for Gemini models, ensuring proper token accounting during streaming responses.

Cross-Namespace Resource References

Cross-namespace AIServiceBackend references

AIGatewayRoute can now reference AIServiceBackend resources in different namespaces, enabling multi-tenant and organizational separation patterns.

ReferenceGrant validation

Comprehensive ReferenceGrant integration following Gateway API patterns, with automatic validation and clear error messages when grants are missing.

Enhanced Upstream Authentication

AWS SDK default credential chain

Support for AWS SDK's default credential chain including IRSA (IAM Roles for Service Accounts), EKS Pod Identity, EC2 Instance Profiles, and environment variables, eliminating need for static credentials or OIDC settings

Azure API key authentication

Native Azure OpenAI API key authentication using the api-key header, matching Azure SDK conventions and console practices.

Traffic Management and Configuration

Header mutations at route and backend levels

New headerMutation fields in both AIServiceBackend and AIGatewayRouteRuleBackendRef enable header manipulation with smart merge logic for advanced routing scenarios.

InferencePool v1 support

Updated to Gateway API Inference Extension v1.0, providing stable intelligent endpoint selection with enhanced performance and reliability.

Cached token usage tracking for actual token usage reporting

Captures and reports cached token statistics from cloud providers (Anthropic, Bedrock, etc.), providing accurate cost attribution for prompt caching features.

Standalone Mode and CLI

Docker image support

Official Docker images for the aigw CLI published to GitHub Container Registry, enabling containerized standalone deployments with proper health checks and lifecycle management.

Multi-provider auto-configuration

Zero-config standalone mode with automatic configuration from OPENAI_API_KEY, AZURE_OPENAI_API_KEY, or ANTHROPIC_API_KEY environment variables. Generates complete Envoy configuration with OpenAI SDK compatibility.

MCP server configuration

Native MCP support in standalone mode via --mcp-config and --mcp-json flags, enabling unified LLM and MCP server configuration in a single aigw run invocation without Kubernetes.

XDG Base Directory standards

Proper separation of configuration, data, state, and runtime files following XDG Base Directory specification, improving organization and enabling better cleanup and management of aigw state.

Enhanced readiness monitoring

Improved Envoy readiness detection and status reporting in standalone mode, providing clear insights into when the gateway is ready to accept traffic with better error messages.

Consolidated admin server

Unified admin server on a single port serving both /metrics and /health endpoints, simplifying monitoring and health check configuration.

Improved error handling

aigw CLI now fails fast and exits cleanly if external processor fails to start, preventing silent failures and improving debugging experience.

Type-safe Kubernetes client SDK

Generated client libraries for all AI Gateway CRDs following standard Kubernetes client-go patterns, enabling developers to build controllers, operators, and custom integrations with type safety.

Observability Enhancements

MCP operations observability

Comprehensive monitoring, logging, and tracing for MCP operations with configurable access logs and metrics enrichment for MCP server interactions and tool routing.

Image generation tracing and metrics

OpenInference-compliant distributed tracing and OpenTelemetry Gen AI metrics for image generation requests with detailed request parameters and timing information.

OpenTelemetry native metrics export

Support for OTEL-native metrics export (in addition to Prometheus), enabling integration with Elastic Stack, OTEL-TUI, and other OTEL-native observability systems. Includes console exporter for ad-hoc debugging.

Embeddings tracing implementation

Complete OpenInference-compliant tracing for embeddings operations, complementing existing chat completion tracing.

Enhanced /messages endpoint metrics

Distinct metrics for Anthropic's /messages endpoint, providing accurate attribution separate from /chat/completions endpoints.

Original model tracking

Metrics now track both the original requested model and any overridden model names, providing accurate attribution in multi-provider and model virtualization scenarios.

🔗 API Updates

New MCPRoute CRD: Introduces MCPRoute custom resource with comprehensive fields for MCP server configuration, tool filtering, authentication policies (OAuth and API key), and Protected Resource Metadata.
Cross-namespace references in AIGatewayRoute: Added namespace field to AIGatewayRouteRuleBackendRef, enabling cross-namespace backend references with ReferenceGrant validation.
Header mutations at route and backend levels: Added headerMutation fields to both AIServiceBackend and AIGatewayRouteRuleBackendRef for backend-level and per-route header manipulation with smart merge logic.
New AWSAnthropic API schema: Added AWSAnthropic schema for Claude models on AWS Bedrock using the native Anthropic Messages API format, providing full feature parity with direct Anthropic API.
Anthropic API key authentication: Added AnthropicAPIKey to BackendSecurityPolicy for x-api-key header authentication.
Azure API key authentication: Added AzureAPIKey to BackendSecurityPolicy for api-key header authentication.
AWS credential chain support: BackendSecurityPolicy AWS auth now supports SDK default credential chain when credentials are not explicitly provided.
InferencePool v1: Updated to support Gateway API Inference Extension v1.0 (inference.networking.k8s.io/v1) instead of v1alpha1.
Enforced Backend resource requirement: Added CRD validation to AIServiceBackend explicitly requiring Envoy Gateway Backend resources (Kubernetes Service is not supported).

📦 Dependencies Versions

Go 1.25.3

Updated to Go 1.25.3 for improved performance and security.

Envoy Gateway v1.5+

Built on Envoy Gateway v1.5+ for proven data plane capabilities and enhanced features. This version is also fully compatible with the upcoming v1.6

Envoy v1.36

Leveraging Envoy Proxy v1.36's battle-tested networking capabilities.

Gateway API v1.4.0

Support for Gateway API v1.4.0 specifications.

Gateway API Inference Extension v1.0.2

Integration with Gateway API Inference Extension v1.0.2 for stable intelligent endpoint selection.

🔮 What's Next (beyond v0.4)

We're already working on exciting features for future releases:

Advanced MCP features - Further enhancements to the MCP protocol support
Additional LLM provider integrations - Expanding the ecosystem of supported LLM providers
Enhanced Performance - Improving the runtime performance of the gateway

← Previous: v0.3.x Series All Releases