Skip to main content

Announcing the Envoy AI Gateway v0.3 Release

· 8 min read
Erica Hughberg
Envoy AI Gateway Maintainer - Tetrate
Xunzhuo (Bit) Liu
Envoy AI Gateway Maintainer - Tencent

Envoy AI Gateway v0.3.0 Release

The Envoy AI Gateway v0.3 release introduces intelligent inference routing through Endpoint Picker (EPP) integration, expands our provider ecosystem with Google Vertex AI Production Support as well as Native Anthropic API, and delivers Enterprise-Grade Observability with OpenInference tracing.

The Big Shifts in v0.3

Envoy AI Gateway v0.3 isn't just another feature release; it's a fundamental shift toward intelligent, production-ready AI infrastructure. This release addresses three critical challenges that have been holding back AI adoption in enterprise environments:

1. From Static to Intelligent Routing

Traditional load balancers treat AI inference endpoints like web servers, but AI workloads are fundamentally different. With Endpoint Picker integration, Envoy AI Gateway now makes intelligent routing decisions based on real-time AI-specific metrics like KV-cache usage, queue depth, and LoRA adapter information.

What this means for you:

BenefitDescription
Latency reductionOptimal endpoint selection based on real-time AI metrics
Automatic resource optimizationIntelligent resource allocation across your inference infrastructure
Zero manual interventionAutomated endpoint management without operational overhead

2. Expanded Provider Ecosystem

We've moved beyond experimental integrations to deliver production-grade support for the AI providers that matter most to enterprises.

Google Vertex AI is now supported with complete streaming capabilities for Gemini models. Anthropic on Vertex AI moves from experimental to production-ready with multi-tool support and configurable API versions.

What this means for you:

BenefitDescription
Unified OpenAI-compatible APISingle interface across Google, Anthropic, AWS, and more providers
Enterprise-grade reliabilityProduction-ready stability for mission-critical AI workloads
Provider flexibilitySwitch between providers without architectural changes or vendor lock-in

3. Enterprise Observability for AI

AI workloads require specialized observability that traditional monitoring tools can't provide. v0.3 delivers comprehensive AI-specific monitoring across four key areas.

What this means for you:

Observability FeatureDescription
OpenInference tracingComplete request lifecycle visibility and evaluation system compatibility
Configurable metrics labelsGranular monitoring based on request headers for custom filtering
Embeddings metrics supportComprehensive token usage tracking for accurate cost attribution
Enhanced GenAI metricsImproved accuracy with OpenTelemetry semantic conventions

Notable New Features in v0.3

Endpoint Picker Provider: The Future of AI Load Balancing

A highlight of v0.3 is our integration with the Gateway API Inference Extension, which allows intelligent endpoint selection that understands AI workloads.

# AIGatewayRoute with InferencePool
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: intelligent-routing
spec:
rules:
- matches:
- headers:
- name: x-ai-eg-model
value: meta-llama/Llama-3.1-8B-Instruct
backendRefs:
- group: inference.networking.x-k8s.io
kind: InferencePool
name: vllm-llama3-pool

This isn't just about load balancing; it's about intelligent infrastructure that adapts to your AI workloads in real-time.

Google Vertex AI: Enterprise AI at Scale

Google Vertex AI support moves to production-ready status with:

  • GCP Vertex AI Authentication with Service Account Key or Workload Identity Federation.
  • Complete Gemini Support with OpenAI API compatibility for function calls, multimodal, reasoning and streaming.
  • Complete Anthropic on Vertex AI Support with OpenAI API compatibility for function calls, multimodal, extended thinking and streaming.
  • Native Anthropic API via GCP Vertex AI to unlock use case like ClaudeCode.
  • Enterprise-grade reliability for mission-critical deployments.

This brings the power of Google's AI platform into your unified AI infrastructure, managed through a single, consistent API.

Comprehensive AI Observability

Traditional observability tools fall short when monitoring AI workloads. v0.3 delivers four significant observability enhancements:

EnhancementFeatureBenefit
OpenInference Tracing IntegrationComplete LLM request tracing with timing and token informationDeep visibility into AI request lifecycle
OpenInference Tracing IntegrationEvaluation system compatibility with tools like Arize PhoenixSeamless integration with AI evaluation workflows
OpenInference Tracing IntegrationFull chat completion request/response data captureComplete audit trail for debugging and analysis
Configurable Metrics LabelsCustom labeling based on HTTP request headersFlexible monitoring and alerting setup
Configurable Metrics LabelsGranular monitoring by user ID, API version, or application contextEnhanced filtering and segmentation
Configurable Metrics LabelsEnhanced filtering and alerting capabilitiesMore targeted monitoring and alerts
Embeddings Metrics SupportComprehensive token usage tracking for both chat and embeddings APIsBetter cost control and usage insights
Embeddings Metrics SupportAccurate cost attribution across different operation typesPrecise cost allocation and budgeting
Embeddings Metrics SupportOpenTelemetry semantic conventions complianceStandardized observability integration
Enhanced GenAI MetricsImproved error handling and attribute mappingMore reliable performance monitoring
Enhanced GenAI MetricsMore accurate token latency measurementsBetter performance analysis data
Enhanced GenAI MetricsBetter performance analysis dataImproved optimization insights

Model Name Virtualization: Abstraction for Flexibility

The new modelNameOverride field enables powerful model abstraction:

backendRefs:
- name: openai-backend
modelNameOverride: "gpt-4"
- name: anthropic-backend
modelNameOverride: "claude-3"

By abstracting away the model name, application developers can use standardized model names, while the gateway handles provider-specific routing. This is, for example, useful when doing A/B testing, gradual migrations, safeguarding against provider lock-in, and multi-provider strategies.

Unified LLM and non-LLM APIs

Enhanced Gateway resource management by allowing both standard HTTPRoute and AIGatewayRoute to be attached to the same Gateway object.

This provides a unified routing configuration that supports both AI and non-AI traffic within a single gateway infrastructure, simplifying deployment and management.

Community Impact and Momentum

Growing Community

The v0.3 release represents the collaborative effort of our rapidly expanding community:

  • Contributors from Tetrate, Bloomberg, Tencent, Google, and Nutanix
  • Independent developers driving innovation
  • Enterprise adopters providing real-world feedback

This diversity of perspectives has shaped v0.3 into a release that serves both bleeding-edge innovators and enterprise production needs.

Star History Chart

Visit on GitHub and Star the Repo to show your support.

Standards Leadership

Our integration with the Gateway API Inference Extension demonstrates our commitment to open standards and vendor-neutral solutions. By building on proven Gateway API patterns, we're ensuring that Envoy AI Gateway remains interoperable and future-proof.

Enabling tracing through OpenInference Tracing Integration further cements and showcases our community's commitment to industry standards, collaboration, and ecosystem integration.

What This Release Enables

BenefitImpact
Simplified model deployment with intelligent routingFaster development cycles
Performance optimization through real-time metricsBetter model performance
Cost control with token-based rate limitingMore predictable operating costs
Multi-model support in a single infrastructureReduced complexity and maintenance
Unified AI infrastructure supporting diverse workloadsScalable, future-proof architecture
Standards-based architecture for long-term sustainabilityVendor-neutral, interoperable solutions
Vendor flexibility without architectural changesReduced lock-in risk
Enterprise observability for production confidenceProduction-ready monitoring
Reduced operational complexity through automationLower operational overhead
Improved reliability with intelligent failoverHigher system reliability
Better resource utilization across infrastructureOptimized infrastructure costs
Streamlined monitoring with AI-specific telemetrySimplified troubleshooting

Get Involved: Join the AI Infrastructure Revolution

The future of AI infrastructure is open, collaborative, and community-driven. Here's how you can be part of it:

ActionResourceDescription
🚀 Try v0.3 TodayDownload the releaseGet the latest release and start exploring
Follow our getting started guideStep-by-step setup instructions
Explore the examplesReal-world configuration examples
💬 Join the CommunityWeekly Community MeetingsAdd to your calendar
Slack Channel #envoy-ai-gatewayJoin the conversation on Envoy Slack
GitHub DiscussionsShare experiences and ask questions
🛠️ Contribute to the FutureReport IssuesHelp us improve by reporting bugs
Request FeaturesTell us what you need for future releases
Submit CodeContribute to the next release

Acknowledgments: The Power of Open Source

v0.3 wouldn't exist without our incredible community. Special recognition goes to:

  • Enterprise contributors who provided production feedback and requirements
  • Open source maintainers from the Gateway API and CNCF communities
  • Individual developers who contributed code, documentation, and ideas
  • Early adopters who tested pre-releases and reported issues

Get Started Today

Ready to experience the future of AI infrastructure?

Get started with Envoy AI Gateway v0.3 and see how intelligent inference routing, expanded provider support, and enterprise observability can transform your AI deployments.

The future of AI infrastructure is open, intelligent, and community-driven. Join us in building it.

🚀 Get Started with v0.3 →


Envoy AI Gateway v0.3 is available now. For detailed release notes, API changes, and upgrade guidance, visit our release notes page.