Skip to main content
Version: latest

Model Name Virtualization

Envoy AI Gateway provides an advanced model name virtualization capability that allows you to manage and route requests to different AI models seamlessly. This guide covers the key feature and configuration for model virtualization.

Motivation

It is not uncommon for multiple AI providers to offer a similar or identical model, such as Llama-3-70b, etc. However, each provider tends to have its own unique naming convention for the same model. For example, Claude 4 Sonnet is hosted both on GCP and AWS Bedrock, but they have different model names:

  • GCP: claude-sonnet-4@20250514, etc.
  • AWS Bedrock: anthropic.claude-sonnet-4-20250514-v1:0

Even within the same provider, model names may vary based on deployment configurations or versions. For example, an OpenAI Platform request to gpt-5-nano might result in a response from gpt-5-nano-2025-08-07. Azure OpenAI uses deployment names in the URL path rather than model names in the request body, so you won't know the model in use until you get the response.

From downstream GenAI applications' perspective, it is beneficial to have a unified model name that abstracts away these differences. We do this while still availing the authoritative model name that served a response.

Provider-specific Virtualization

AI providers handle model naming and execution differently to balance flexibility, determinism, and optimization. This section categorizes these behaviors into virtualization types, enabling Envoy AI Gateway to abstract differences for unified downstream access while retaining executed model details where possible.

  • Automatic Routing & Resolution: Providers select and optimize models at runtime behind generic identifiers, returning the actual executed version for reproducibility and debugging.
  • Static Model Execution: Uses direct mapping to immutable identifiers or fixed versions for deterministic execution, ensuring requested models run exactly as specified, ideal for consistent embeddings.
  • Deterministic Snapshot Mapping: Resolves models via endpoint/URL with dated snapshots (no response model field), providing consistent cross-platform behavior through timestamped versions.
  • URI-Based Resolution: Ignores request body model field, using URI path for deployment-based routing while returning the actual versioned model name in responses.
  • Third-Party Delegation: Delegates embeddings to specialized third-party services with separate APIs, integrating ecosystem expertise while focusing on core capabilities.
UpstreamAPI TypeVirtualization TypeRequest Model ExampleResponse Model Example
awsbedrockChat CompletionsStatic Model Executionanthropic.claude-sonnet-4-20250514-v1:0N/A (no model field)
awsbedrockEmbeddingsStatic Model Executionamazon.titan-embed-text-v2:0N/A (no model field)
azureopenaiChat CompletionsURI-Based Resolution{any-value} (ignored)gpt-5-nano-2025-08-07
azureopenaiEmbeddingsURI-Based Resolution{any-value} (ignored)text-embedding-ada-002-2
gcpanthropicChat CompletionsDeterministic Snapshot Mappingclaude-sonnet-4@20250514N/A (no model field)
gcpanthropicEmbeddingsThird-Party Delegationvoyage-3.5N/A (no model field)
gcpvertexaiChat CompletionsDeterministic Snapshot Mappinggemini-1.5-pro-002N/A (no model field)
gcpvertexaiEmbeddingsStatic Model Executiontext-embedding-004N/A (no model field)
openaiChat CompletionsAutomatic Routing & Resolutiongpt-5-nanogpt-5-nano-2025-08-07
openaiEmbeddingsStatic Model Executiontext-embedding-3-smalltext-embedding-3-small

Azure OpenAI URI-Based Behavior

Azure OpenAI has a unique approach where the model field in the request JSON is completely ignored. The deployment name in the URI path determines which model is actually used, and the response returns the actual versioned model name:

  • URI Format: https://{resource}.openai.azure.com/openai/deployments/{deployment-name}/chat/completions
  • Request Body: {"model": "anything"} ← This value is ignored
  • Response: {"model": "gpt-5-nano-2025-08-07"} ← Returns the actual versioned model name

This means Azure OpenAI doesn't read the model field from the request JSON at all - it uses the deployment name from the URI path exclusively, but returns the real underlying versioned model identifier in the response.

AWS Bedrock Static Execution

AWS Bedrock uses static model execution with immutable versioned identifiers. The Converse API does not return a model field in responses. Each model ID like anthropic.claude-sonnet-4-20250514-v1:0 represents a specific, frozen model version with no automatic updates or routing.

GCP Anthropic Embeddings Delegation

GCP Anthropic delegates embeddings to Voyage AI's specialized embedding models rather than providing Claude-native embeddings. Available models include:

  • voyage-3.5 - Balanced performance embedding model
  • voyage-3-large - Best quality general-purpose embeddings
  • voyage-code-3 - Optimized for code retrieval
  • voyage-law-2 - Legal domain-specific embeddings

This delegation approach allows Anthropic to focus on language model capabilities while leveraging domain expertise for embeddings.

OpenAI Virtualization

OpenAI performs automatic model routing where generic identifiers like gpt-5-nano may route to different model versions like gpt-5-nano-2025-08-07 based on availability and optimization.

GCP Provider Behavior

Both GCP Anthropic and GCP Vertex AI use endpoint-based model resolution where the model is specified in the URI path rather than returned in the response body. This provides deterministic behavior with timestamped snapshots.

Virtualization with modelNameOverride API

In our top level AIGatewayRoute configuration, you can specify a modelNameOverride inside AIGatewayRouteBackendRef on each route rule to override the model name that is sent to the upstream AI provider. This feature is primarily designed for scenarios where you want to dynamically change the model name based on the actual AI provider the request is being sent to.

The example configuration looks like this:

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: test-route
spec:
targetRefs: [...]
rules:
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: claude-4-sonnet
backendRefs:
- name: aws-backend
modelNameOverride: anthropic.claude-sonnet-4-20250514-v1:0
weight: 50
- name: gcp-backend
modelNameOverride: claude-sonnet-4@20250514
weight: 50

This configuration allows downstream applications to use a unified model name claude-4-sonnet while splitting traffic between the AWS Bedrock and GCP AI providers based on the specified modelNameOverride. This is what the word "Virtualization" means in this context: abstracting away the differences in model names across different AI providers and providing a unified interface for downstream applications. It also can be thought of as "one-to-many" aliasing of model names, where one unified model name can map to multiple different model names on different providers depending on the routing path.

Virtualization for fallback scenarios

As we see in the Provider Fallback page, Envoy AI Gateway allows you to fallback to a different AI provider if the primary one fails. However, sometimes we want to fallback to a different model on the same provider. For example, it is natural to set up the Envoy AI Gateway in a way that if the primary expensive model fails (rate limit, etc), Envoy retries the request to a less expensive model on the same provider. More concretely, if the request to gpt-5-nano fails, we want to retry it with gpt-5-nano-mini on the same OpenAI provider.

modelNameOverride can also be used in this scenario to achieve the desired behavior. The configuration would look like this:

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: test-route
spec:
targetRefs: [...]
rules:
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: gpt-5-nano
backendRefs:
- name: openai-backend
# This doesn't specify modelNameOverride, so it will use the default model name `gpt-5-nano` in the request.
priority: 0
- name: openai-backend
modelNameOverride: gpt-5-nano-mini
priority: 1

With this configuration, assuming the retry is properly configured as per the Provider Fallback page, if the request to gpt-5-nano fails, Envoy AI Gateway will automatically retry the request to gpt-5-nano-mini on the same OpenAI provider without requiring any changes to the downstream application.