Envoy AI Gateway v0.4.x
v0.4.0
✨ New Features
Model Context Protocol (MCP) Gateway
MCPRoute CRDIntroduces MCPRoute custom resource for routing MCP requests to backend MCP servers, enabling unified AI API for multiple MCP backends.
Includes streamable HTTP transport, JSON-RPC 2.0 support, and MCP spec-compliant OAuth 2.0 authorization with JWKS validation and Protected Resource Metadata.
Aggregates multiple MCP servers behind a single endpoint with intelligent tool routing, tool filtering (exact match and regex patterns), and collision detection.
Supports both OAuth-based authentication and API key authentication for secure backend MCP server communication with configurable headers.
Implements MCP session handling with encryption, rotatable seeds, and graceful session lifecycle management.
Anthropic Provider Support
Native integration with Anthropic's API at api.anthropic.com, complementing existing GCP Vertex AI Anthropic support.
Support for Claude models on AWS Bedrock using the native Anthropic Messages API format instead of the generic Converse API, enabling full feature parity with direct Anthropic API including prompt caching and extended thinking.
Native x-api-key header-based authentication matching Anthropic's API conventions and SDK patterns for direct Anthropic connections.
Efficient passthrough translation layer that captures token usage and maintains API compatibility while minimizing overhead for both direct and AWS Bedrock Anthropic endpoints.
Auto-configuration from ANTHROPIC_API_KEY environment variable in standalone mode for zero-config deployments.
Guided Output Support for GCP Vertex AI/Gemini
Constrains model outputs to match specific regular expressions for GCP Vertex AI/Gemini models, enabling structured text generation.
Restricts model outputs to predefined choices for GCP Vertex AI/Gemini models, ensuring responses conform to expected values.
Ensures model outputs are valid JSON conforming to specified schemas for GCP Vertex AI/Gemini models, with OpenAI-compatible API translation.
Provider-Specific Enhancements
/v1/images/generations endpointEnd-to-end support for OpenAI's image generation API including request/response translation, Brotli encoding/decoding, and full protocol compatibility.
/v1/completions endpointFull pass-through support for OpenAI's legacy completions endpoint with complete tracing and metrics, ensuring backward compatibility.
Native support for Azure OpenAI embeddings API with proper protocol translation and token usage tracking.
Full support for reasoning/thinking tokens in AWS Bedrock responses for both streaming and non-streaming modes, properly exposing extended thinking processes in Claude models.
Support for GCP-specific safety settings configuration, allowing fine-grained control over content filtering and safety thresholds for Gemini models.
Accurate completion_tokens reporting in streaming usage chunks for Gemini models, ensuring proper token accounting during streaming responses.
Cross-Namespace Resource References
AIServiceBackend referencesAIGatewayRoute can now reference AIServiceBackend resources in different namespaces, enabling multi-tenant and organizational separation patterns.
Comprehensive ReferenceGrant integration following Gateway API patterns, with automatic validation and clear error messages when grants are missing.
Enhanced Upstream Authentication
Support for AWS SDK's default credential chain including IRSA (IAM Roles for Service Accounts), EKS Pod Identity, EC2 Instance Profiles, and environment variables, eliminating need for static credentials or OIDC settings
Native Azure OpenAI API key authentication using the api-key header, matching Azure SDK conventions and console practices.
Traffic Management and Configuration
New headerMutation fields in both AIServiceBackend and AIGatewayRouteRuleBackendRef enable header manipulation with smart merge logic for advanced routing scenarios.
InferencePool v1 supportUpdated to Gateway API Inference Extension v1.0, providing stable intelligent endpoint selection with enhanced performance and reliability.
Captures and reports cached token statistics from cloud providers (Anthropic, Bedrock, etc.), providing accurate cost attribution for prompt caching features.
Standalone Mode and CLI
Official Docker images for the aigw CLI published to GitHub Container Registry, enabling containerized standalone deployments with proper health checks and lifecycle management.
Zero-config standalone mode with automatic configuration from OPENAI_API_KEY, AZURE_OPENAI_API_KEY, or ANTHROPIC_API_KEY environment variables. Generates complete Envoy configuration with OpenAI SDK compatibility.
Native MCP support in standalone mode via --mcp-config and --mcp-json flags, enabling unified LLM and MCP server configuration in a single aigw run invocation without Kubernetes.
Proper separation of configuration, data, state, and runtime files following XDG Base Directory specification, improving organization and enabling better cleanup and management of aigw state.
Improved Envoy readiness detection and status reporting in standalone mode, providing clear insights into when the gateway is ready to accept traffic with better error messages.
Unified admin server on a single port serving both /metrics and /health endpoints, simplifying monitoring and health check configuration.
aigw CLI now fails fast and exits cleanly if external processor fails to start, preventing silent failures and improving debugging experience.
Generated client libraries for all AI Gateway CRDs following standard Kubernetes client-go patterns, enabling developers to build controllers, operators, and custom integrations with type safety.
Observability Enhancements
Comprehensive monitoring, logging, and tracing for MCP operations with configurable access logs and metrics enrichment for MCP server interactions and tool routing.
OpenInference-compliant distributed tracing and OpenTelemetry Gen AI metrics for image generation requests with detailed request parameters and timing information.
Support for OTEL-native metrics export (in addition to Prometheus), enabling integration with Elastic Stack, OTEL-TUI, and other OTEL-native observability systems. Includes console exporter for ad-hoc debugging.
Complete OpenInference-compliant tracing for embeddings operations, complementing existing chat completion tracing.
/messages endpoint metricsDistinct metrics for Anthropic's /messages endpoint, providing accurate attribution separate from /chat/completions endpoints.
Metrics now track both the original requested model and any overridden model names, providing accurate attribution in multi-provider and model virtualization scenarios.
🔗 API Updates
- New
MCPRouteCRD: IntroducesMCPRoutecustom resource with comprehensive fields for MCP server configuration, tool filtering, authentication policies (OAuth and API key), and Protected Resource Metadata. - Cross-namespace references in
AIGatewayRoute: Added namespace field toAIGatewayRouteRuleBackendRef, enabling cross-namespace backend references with ReferenceGrant validation. - Header mutations at route and backend levels: Added headerMutation fields to both
AIServiceBackendandAIGatewayRouteRuleBackendReffor backend-level and per-route header manipulation with smart merge logic. - New
AWSAnthropicAPI schema: AddedAWSAnthropicschema for Claude models on AWS Bedrock using the native Anthropic Messages API format, providing full feature parity with direct Anthropic API. - Anthropic API key authentication: Added
AnthropicAPIKeytoBackendSecurityPolicyforx-api-keyheader authentication. - Azure API key authentication: Added AzureAPIKey to
BackendSecurityPolicyfor api-key header authentication. - AWS credential chain support:
BackendSecurityPolicyAWS auth now supports SDK default credential chain when credentials are not explicitly provided. InferencePoolv1: Updated to support Gateway API Inference Extension v1.0 (inference.networking.k8s.io/v1) instead of v1alpha1.- Enforced
Backendresource requirement: Added CRD validation toAIServiceBackendexplicitly requiring Envoy GatewayBackendresources (Kubernetes Service is not supported).
📦 Dependencies Versions
Updated to Go 1.25.3 for improved performance and security.
Built on Envoy Gateway v1.5+ for proven data plane capabilities and enhanced features. This version is also fully compatible with the upcoming v1.6
Leveraging Envoy Proxy v1.36's battle-tested networking capabilities.
Support for Gateway API v1.4.0 specifications.
Integration with Gateway API Inference Extension v1.0.2 for stable intelligent endpoint selection.
🔮 What's Next (beyond v0.4)
We're already working on exciting features for future releases:
- Advanced MCP features - Further enhancements to the MCP protocol support
- Additional LLM provider integrations - Expanding the ecosystem of supported LLM providers
- Enhanced Performance - Improving the runtime performance of the gateway