Skip to main content
Version: 0.2

Provider Fallback

Envoy AI Gateway supports provider fallback to ensure high availability and reliability for AI/LLM workloads. With fallback, you can configure multiple upstream providers for a single route, so that if the primary provider fails (due to network errors, 5xx responses, or other health check failures), traffic is automatically routed to a healthy fallback provider.

When to Use Fallback

  • To ensure uninterrupted service when a primary AI/LLM provider is unavailable.
  • To provide redundancy across multiple cloud or on-premise model providers.
  • To implement active-active or active-passive failover strategies for critical AI workloads.

How Fallback Works

  • Primary and Fallback Backends: You can specify a prioritized list of backends in your AIGatewayRoute using backendRefs. The first backend is treated as primary, and subsequent backends are considered fallbacks.
  • Health Checks: Fallback is triggered based on passive health checks and retry policies, which can be configured using the BackendTrafficPolicy API.
  • Automatic Failover: When the primary backend becomes unhealthy, Envoy AI Gateway automatically shifts traffic to the next healthy fallback backend.

Example

Below is an example configuration that demonstrates provider fallback from a failing upstream to AWS Bedrock:

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: provider-fallback
namespace: default
spec:
schema:
name: OpenAI
targetRefs:
- name: provider-fallback
kind: Gateway
group: gateway.networking.k8s.io
rules:
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: us.meta.llama3-2-1b-instruct-v1:0
backendRefs:
- name: provider-fallback-always-failing-upstream # Primary backend (expected to fail)
priority: 0
- name: provider-fallback-aws # Fallback backend
priority: 1

The corresponding Backend resources:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: provider-fallback-always-failing-upstream
namespace: default
spec:
endpoints:
- fqdn:
hostname: provider-fallback-always-failing-upstream.default.svc.cluster.local
port: 443
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: provider-fallback-aws
namespace: default
spec:
endpoints:
- fqdn:
hostname: bedrock-runtime.us-east-1.amazonaws.com
port: 443

Configuring Fallback Behavior

Attach a BackendTrafficPolicy to the generated HTTPRoute to control retry and health check behavior:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: passive-health-check
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: provider-fallback # HTTPRoute is created with the same name as AIGatewayRoute
retry:
numRetries: 5
perRetry:
backOff:
baseInterval: 100ms
maxInterval: 10s
timeout: 30s
retryOn:
httpStatusCodes:
- 500
triggers:
- connect-failure
- retriable-status-codes
healthCheck:
passive:
baseEjectionTime: 5s
interval: 2s
maxEjectionPercent: 100
consecutive5XxErrors: 1
consecutiveGatewayErrors: 0
consecutiveLocalOriginFailures: 1
splitExternalLocalOriginErrors: false

References