Provider Fallback
Envoy AI Gateway supports provider fallback to ensure high availability and reliability for AI/LLM workloads. With fallback, you can configure multiple upstream providers for a single route, so that if the primary provider fails (due to network errors, 5xx responses, or other health check failures), traffic is automatically routed to a healthy fallback provider.
When to Use Fallback
- To ensure uninterrupted service when a primary AI/LLM provider is unavailable.
- To provide redundancy across multiple cloud or on-premise model providers.
- To implement active-active or active-passive failover strategies for critical AI workloads.
How Fallback Works
- Primary and Fallback Backends: You can specify a prioritized list of backends in your
AIGatewayRoute
usingbackendRefs
. The first backend is treated as primary, and subsequent backends are considered fallbacks. - Health Checks: Fallback is triggered based on passive health checks and retry policies, which can be configured using the
BackendTrafficPolicy
API. - Automatic Failover: When the primary backend becomes unhealthy, Envoy AI Gateway automatically shifts traffic to the next healthy fallback backend.
Example
Below is an example configuration that demonstrates provider fallback from a failing upstream to AWS Bedrock:
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: provider-fallback
namespace: default
spec:
schema:
name: OpenAI
targetRefs:
- name: provider-fallback
kind: Gateway
group: gateway.networking.k8s.io
rules:
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: us.meta.llama3-2-1b-instruct-v1:0
backendRefs:
- name: provider-fallback-always-failing-upstream # Primary backend (expected to fail)
priority: 0
- name: provider-fallback-aws # Fallback backend
priority: 1
The corresponding Backend
resources:
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: provider-fallback-always-failing-upstream
namespace: default
spec:
endpoints:
- fqdn:
hostname: provider-fallback-always-failing-upstream.default.svc.cluster.local
port: 443
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: provider-fallback-aws
namespace: default
spec:
endpoints:
- fqdn:
hostname: bedrock-runtime.us-east-1.amazonaws.com
port: 443
Configuring Fallback Behavior
Attach a BackendTrafficPolicy
to the generated HTTPRoute
to control retry and health check behavior:
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: passive-health-check
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: provider-fallback # HTTPRoute is created with the same name as AIGatewayRoute
retry:
numRetries: 5
perRetry:
backOff:
baseInterval: 100ms
maxInterval: 10s
timeout: 30s
retryOn:
httpStatusCodes:
- 500
triggers:
- connect-failure
- retriable-status-codes
healthCheck:
passive:
baseEjectionTime: 5s
interval: 2s
maxEjectionPercent: 100
consecutive5XxErrors: 1
consecutiveGatewayErrors: 0
consecutiveLocalOriginFailures: 1
splitExternalLocalOriginErrors: false