Version: 0.2

Provider Fallback

Envoy AI Gateway supports provider fallback to ensure high availability and reliability for AI/LLM workloads. With fallback, you can configure multiple upstream providers for a single route, so that if the primary provider fails (due to network errors, 5xx responses, or other health check failures), traffic is automatically routed to a healthy fallback provider.

When to Use Fallback

To ensure uninterrupted service when a primary AI/LLM provider is unavailable.
To provide redundancy across multiple cloud or on-premise model providers.
To implement active-active or active-passive failover strategies for critical AI workloads.

How Fallback Works

Primary and Fallback Backends: You can specify a prioritized list of backends in your AIGatewayRoute using backendRefs. The first backend is treated as primary, and subsequent backends are considered fallbacks.
Health Checks: Fallback is triggered based on passive health checks and retry policies, which can be configured using the BackendTrafficPolicy API.
Automatic Failover: When the primary backend becomes unhealthy, Envoy AI Gateway automatically shifts traffic to the next healthy fallback backend.

Example

Below is an example configuration that demonstrates provider fallback from a failing upstream to AWS Bedrock:

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: provider-fallback
  namespace: default
spec:
  schema:
    name: OpenAI
  targetRefs:
    - name: provider-fallback
      kind: Gateway
      group: gateway.networking.k8s.io
  rules:
    - matches:
        - headers:
            - type: Exact
              name: x-ai-eg-model
              value: us.meta.llama3-2-1b-instruct-v1:0
      backendRefs:
        - name: provider-fallback-always-failing-upstream  # Primary backend (expected to fail)
          priority: 0
        - name: provider-fallback-aws                      # Fallback backend
          priority: 1

The corresponding Backend resources:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: provider-fallback-always-failing-upstream
  namespace: default
spec:
  endpoints:
    - fqdn:
        hostname: provider-fallback-always-failing-upstream.default.svc.cluster.local
        port: 443
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: provider-fallback-aws
  namespace: default
spec:
  endpoints:
    - fqdn:
        hostname: bedrock-runtime.us-east-1.amazonaws.com
        port: 443

Configuring Fallback Behavior

Attach a BackendTrafficPolicy to the generated HTTPRoute to control retry and health check behavior:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: passive-health-check
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: provider-fallback # HTTPRoute is created with the same name as AIGatewayRoute
  retry:
    numRetries: 5
    perRetry:
      backOff:
        baseInterval: 100ms
        maxInterval: 10s
      timeout: 30s
    retryOn:
      httpStatusCodes:
        - 500
      triggers:
        - connect-failure
        - retriable-status-codes
  healthCheck:
    passive:
      baseEjectionTime: 5s
      interval: 2s
      maxEjectionPercent: 100
      consecutive5XxErrors: 1
      consecutiveGatewayErrors: 0
      consecutiveLocalOriginFailures: 1
      splitExternalLocalOriginErrors: false

When to Use Fallback​

How Fallback Works​

Example​

Configuring Fallback Behavior​

References​

When to Use Fallback

How Fallback Works

Example

Configuring Fallback Behavior

References