API Reference
aigateway.envoyproxy.io/v1alpha1
Package v1alpha1 contains API schema definitions for the aigateway.envoyproxy.io API group.
Resource Kinds
Available Kinds
- AIGatewayRoute
- AIGatewayRouteList
- AIServiceBackend
- AIServiceBackendList
- BackendSecurityPolicy
- BackendSecurityPolicyList
Kind Definitions
AIGatewayRoute
Appears in:
AIGatewayRoute combines multiple AIServiceBackends and attaching them to Gateway(s) resources.
This serves as a way to define a "unified" AI API for a Gateway which allows downstream clients to use a single schema API to interact with multiple AI backends.
The schema field is used to determine the structure of the requests that the Gateway will receive. And then the Gateway will route the traffic to the appropriate AIServiceBackend based on the output schema of the AIServiceBackend while doing the other necessary jobs like upstream authentication, rate limit, etc.
Envoy AI Gateway will generate the following k8s resources corresponding to the AIGatewayRoute:
- HTTPRoute of the Gateway API as a top-level resource to bind all backends. The name of the HTTPRoute is the same as the AIGatewayRoute.
- EnvoyExtensionPolicy of the Envoy Gateway API to attach the AI Gateway filter into the target Gateways.
This will be created per Gateway, and its name is
ai-eg-eep-${gateway-name}
. - HTTPRouteFilter of the Envoy Gateway API per namespace for automatic hostname rewrite.
The name of the HTTPRouteFilter is
ai-eg-host-rewrite
.
All of these resources are created in the same namespace as the AIGatewayRoute. Note that this is the implementation detail subject to change. If you want to customize the default behavior of the Envoy AI Gateway, you can use these resources as a reference and create your own resources. Alternatively, you can use EnvoyPatchPolicy API of the Envoy Gateway to patch the generated resources. For example, you can configure the retry fallback behavior by attaching BackendTrafficPolicy API of Envoy Gateway to the generated HTTPRoute.
Fields
apiVersion
requiredaigateway.envoyproxy.io/v1alpha1
of the API.kind
requiredAIGatewayRoute
resourcemetadata
.AIGatewayRouteList
AIGatewayRouteList contains a list of AIGatewayRoute.
Fields
apiVersion
requiredaigateway.envoyproxy.io/v1alpha1
of the API.kind
requiredAIGatewayRouteList
resourcemetadata
.AIServiceBackend
Appears in:
AIServiceBackend is a resource that represents a single backend for AIGatewayRoute. A backend is a service that handles traffic with a concrete API specification.
A AIServiceBackend is "attached" to a Backend which is either a k8s Service or a Backend resource of the Envoy Gateway.
When a backend with an attached AIServiceBackend is used as a routing target in the AIGatewayRoute (more precisely, the HTTPRouteSpec defined in the AIGatewayRoute), the ai-gateway will generate the necessary configuration to do the backend specific logic in the final HTTPRoute.
Fields
apiVersion
requiredaigateway.envoyproxy.io/v1alpha1
of the API.kind
requiredAIServiceBackend
resourcemetadata
.AIServiceBackendList
AIServiceBackendList contains a list of AIServiceBackends.
Fields
apiVersion
requiredaigateway.envoyproxy.io/v1alpha1
of the API.kind
requiredAIServiceBackendList
resourcemetadata
.BackendSecurityPolicy
Appears in:
BackendSecurityPolicy specifies configuration for authentication and authorization rules on the traffic exiting the gateway to the backend.
Fields
apiVersion
requiredaigateway.envoyproxy.io/v1alpha1
of the API.kind
requiredBackendSecurityPolicy
resourcemetadata
.BackendSecurityPolicyList
BackendSecurityPolicyList contains a list of BackendSecurityPolicy
Fields
apiVersion
requiredaigateway.envoyproxy.io/v1alpha1
of the API.kind
requiredBackendSecurityPolicyList
resourcemetadata
.Supporting Types
Available Types
- AIGatewayFilterConfig
- AIGatewayFilterConfigExternalProcessor
- AIGatewayFilterConfigType
- AIGatewayRouteRule
- AIGatewayRouteRuleBackendRef
- AIGatewayRouteRuleMatch
- AIGatewayRouteSpec
- AIGatewayRouteStatus
- AIServiceBackendSpec
- AIServiceBackendStatus
- APISchema
- AWSCredentialsFile
- AWSOIDCExchangeToken
- AzureOIDCExchangeToken
- BackendSecurityPolicyAPIKey
- BackendSecurityPolicyAWSCredentials
- BackendSecurityPolicyAzureCredentials
- BackendSecurityPolicyOIDC
- BackendSecurityPolicySpec
- BackendSecurityPolicyStatus
- BackendSecurityPolicyType
- LLMRequestCost
- LLMRequestCostType
- VersionedAPISchema
Type Definitions
AIGatewayFilterConfig
Appears in:
Fields
Currently, only ExternalProcessor is supported, and default is ExternalProcessor.
This is optional, and if not set, the default values of Deployment spec will be used.
AIGatewayFilterConfigExternalProcessor
Appears in:
Fields
replicas
optionalDeprecated: This field is no longer used.
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
Note: when multiple AIGatewayRoute resources are attached to the same Gateway, and each
AIGatewayRoute has a different resource configuration, the ai-gateway will pick one of them
to configure the resource requirements of the external processor container.
AIGatewayFilterConfigType
Underlying type: string
Appears in:
AIGatewayFilterConfigType specifies the type of the filter configuration.
Possible Values
ExternalProcessor
DynamicModule
AIGatewayRouteRule
Appears in:
AIGatewayRouteRule is a rule that defines the routing behavior of the AIGatewayRoute.
Fields
Each backend can have a weight that determines the traffic distribution.
The namespace of each backend is
local
, i.e. the same namespace as the AIGatewayRoute.By configuring multiple backends, you can achieve the fallback behavior in the case of
the primary backend is not available combined with the BackendTrafficPolicy of Envoy Gateway.
Please refer to https://gateway.envoyproxy.io/docs/tasks/traffic/failover/ as well as
https://gateway.envoyproxy.io/docs/tasks/traffic/retry/.
This is a subset of the HTTPRouteMatch in the Gateway API. See for the details:
https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.HTTPRouteMatch
modelsOwnedBy
optionalwhich will be exported as the field of
OwnedBy
in openai-compatible API /models
.This is used only when this rule contains
x-ai-eg-model
in its header matchingwhere the header value will be recognized as a
model
in /models
endpoint.All the matched models will share the same owner.
Default to
Envoy AI Gateway
if not set.which will be exported as the field of
Created
in openai-compatible API /models
.It follows the format of RFC 3339, for example
2024-05-21T10:00:00Z
.This is used only when this rule contains
x-ai-eg-model
in its header matchingwhere the header value will be recognized as a
model
in /models
endpoint.All the matched models will share the same creation time.
Default to the creation timestamp of the AIGatewayRoute if not set.
AIGatewayRouteRuleBackendRef
Appears in:
AIGatewayRouteRuleBackendRef is a reference to a backend with a weight.
Fields
name
requiredweight
optionalthe BackendRef in the Gateway API. See for the details:
https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.BackendRef
Default is 1.
AIGatewayRouteRuleMatch
Appears in:
Fields
headers
optionalhttps://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.HTTPHeaderMatch
Currently, only the exact header matching is supported.
AIGatewayRouteSpec
Appears in:
AIGatewayRouteSpec details the AIGatewayRoute configuration.
Fields
Based on this schema, the ai-gateway will perform the necessary transformation to the
output schema specified in the selected AIServiceBackend during the routing process.
Currently, the only supported schema is OpenAI as the input schema.
Each rule is a subset of the HTTPRoute in the Gateway API (https://gateway-api.sigs.k8s.io/api-types/httproute/).
AI Gateway controller will generate a HTTPRoute based on the configuration given here with the additional
modifications to achieve the necessary jobs, notably inserting the AI Gateway filter responsible for
the transformation of the request and response, etc.
In the matching conditions in the AIGatewayRouteRule,
x-ai-eg-model
header is availableif we want to describe the routing behavior based on the model name. The model name is extracted
from the request content before the routing decision.
How multiple rules are matched is the same as the Gateway API. See for the details:
https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.HTTPRoute
An AI Gateway filter is responsible for the transformation of the request and response
as well as the routing behavior based on the model name extracted from the request content, etc.
Currently, the filter is only implemented as an external processor filter, which might be
extended to other types of filters in the future. See https://github.com/envoyproxy/ai-gateway/issues/90
The AI Gateway filter will capture each specified number and store it in the Envoy's dynamic
metadata per HTTP request. The namespaced key is
io.envoy.ai_gateway
,For example, let's say we have the following LLMRequestCosts configuration:
llmRequestCosts:
- metadataKey: llm_input_token
type: InputToken
- metadataKey: llm_output_token
type: OutputToken
- metadataKey: llm_total_token
type: TotalToken
Then, with the following BackendTrafficPolicy of Envoy Gateway, you can have three
rate limit buckets for each unique x-user-id header value. One bucket is for the input token,
the other is for the output token, and the last one is for the total token.
Each bucket will be reduced by the corresponding token usage captured by the AI Gateway filter.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: some-example-token-rate-limit
namespace: default
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: usage-rate-limit
rateLimit:
type: Global
global:
rules:
- clientSelectors:
# Do the rate limiting based on the x-user-id header.
- headers:
- name: x-user-id
type: Distinct
limit:
# Configures the number of tokens
allowed per hour.
requests: 10000
unit: Hour
cost:
request:
from: Number
# Setting the request cost to zero allows to only check the rate limit budget,
# and not consume the budget on the request path.
number: 0
# This specifies the cost of the response retrieved from the dynamic metadata set by the AI Gateway filter.
# The extracted value will be used to consume the rate limit budget, and subsequent requests will be rate limited
# if the budget is exhausted.
response:
from: Metadata
metadata:
namespace: io.envoy.ai_gateway
key: llm_input_token
- clientSelectors:
- headers:
- name: x-user-id
type: Distinct
limit:
requests: 10000
unit: Hour
cost:
request:
from: Number
number: 0
response:
from: Metadata
metadata:
namespace: io.envoy.ai_gateway
key: llm_output_token
- clientSelectors:
- headers:
- name: x-user-id
type: Distinct
limit:
requests: 10000
unit: Hour
cost:
request:
from: Number
number: 0
response:
from: Metadata
metadata:
namespace: io.envoy.ai_gateway
key: llm_total_token
Note that when multiple AIGatewayRoute resources are attached to the same Gateway, and
different costs are configured for the same metadata key, the ai-gateway will pick one of them
to configure the metadata key in the generated HTTPRoute, and ignore the rest.
AIGatewayRouteStatus
Appears in:
AIGatewayRouteStatus contains the conditions by the reconciliation result.
Fields
Currently, at most one condition is set.
Known .status.conditions.type are:
Accepted
, NotAccepted
.AIServiceBackendSpec
Appears in:
AIServiceBackendSpec details the AIServiceBackend configuration.
Fields
Envoy that this AIServiceBackend can accept as incoming requests.
Based on this schema, the ai-gateway will perform the necessary transformation for
the pair of AIGatewayRouteSpec.APISchema and AIServiceBackendSpec.APISchema.
This is required to be set.
A backend must be a Backend resource of Envoy Gateway. Note that k8s Service will be supported
as a backend in the future.
This is required to be set.
is being attached to.
Deprecated: Use the
BackendTrafficPolicySpec
for a backend-specific timeout configuration, orAIGatewayRouteSpec.Rules[].Timeouts for a route-specific timeout configuration. When both this field and
AIGatewayRouteSpec.Rules[].Timeouts are set, the latter will take precedence, i.e., this field will be ignored.
AIServiceBackendStatus
Appears in:
AIServiceBackendStatus contains the conditions by the reconciliation result.
Fields
Currently, at most one condition is set.
Known .status.conditions.type are:
Accepted
, NotAccepted
.APISchema
Underlying type: string
Appears in:
APISchema defines the API schema.
Possible Values
OpenAI
https://github.com/openai/openai-openapi
AWSBedrock
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock_Runtime.html
AzureOpenAI
https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#api-specs
AWSCredentialsFile
Appears in:
AWSCredentialsFile specifies the credentials file to use for the AWS provider. Envoy reads the secret file, and the profile to use is specified by the Profile field.
Fields
The secret should contain the AWS credentials file keyed on
credentials
.profile
requiredAWSOIDCExchangeToken
Appears in:
AWSOIDCExchangeToken specifies credentials to obtain oidc token from a sso server. For AWS, the controller will query STS to obtain AWS AccessKeyId, SecretAccessKey, and SessionToken, and store them in a temporary credentials file.
Fields
grantType
optionalaud
optionalawsRoleArn
requiredwhich maps to the temporary AWS security credentials exchanged using the authentication token issued by OIDC provider.
AzureOIDCExchangeToken
Appears in:
AzureOIDCExchangeToken specifies credentials to obtain oidc token from a sso server. For Azure, the controller will query Azure Entra ID to get an Azure Access Token, and store them in a secret.
Fields
grantType
optionalaud
optionalBackendSecurityPolicyAPIKey
Appears in:
BackendSecurityPolicyAPIKey specifies the API key.
Fields
ai-gateway must be given the permission to read this secret.
The key of the secret should be
apiKey
.BackendSecurityPolicyAWSCredentials
Appears in:
BackendSecurityPolicyAWSCredentials contains the supported authentication mechanisms to access aws.
Fields
region
requiredused to obtain temporary credentials to access AWS.
BackendSecurityPolicyAzureCredentials
Appears in:
BackendSecurityPolicyAzureCredentials contains the supported authentication mechanisms to access Azure. Only one of ClientSecretRef or OIDCExchangeToken must be specified. Credentials will not be generated if neither are set.
Fields
clientID
requiredtenantID
requiredai-gateway must be given the permission to read this secret.
The key of secret should be
client-secret
.used to obtain temporary credentials to access Azure.
BackendSecurityPolicyOIDC
Appears in:
BackendSecurityPolicyOIDC specifies OIDC related fields.
Fields
grantType
optionalaud
optionalBackendSecurityPolicySpec
Appears in:
BackendSecurityPolicySpec specifies authentication rules on access the provider from the Gateway. Only one mechanism to access a backend(s) can be specified.
Only one type of BackendSecurityPolicy can be defined.
Fields
APIKey
, AWSCredentials
, and AzureCredentials
are supported.BackendSecurityPolicyStatus
Appears in:
BackendSecurityPolicyStatus contains the conditions by the reconciliation result.
Fields
Currently, at most one condition is set.
Known .status.conditions.type are:
Accepted
, NotAccepted
.BackendSecurityPolicyType
Underlying type: string
Appears in:
BackendSecurityPolicyType specifies the type of auth mechanism used to access a backend.
Possible Values
APIKey
AWSCredentials
AzureCredentials
LLMRequestCost
Appears in:
LLMRequestCost configures each request cost.
Fields
metadataKey
requiredOutputToken
,and it uses
output token
as the cost. The other types are InputToken
, TotalToken
,and
CEL
.cel
optionalThe CEL expression must return a signed or unsigned integer. If the
return value is negative, it will be error.
The expression can use the following variables:
* model: the model name extracted from the request content. Type: string.
* backend: the backend name in the form of
name.namespace
. Type: string.* input_tokens: the number of input tokens. Type: unsigned integer.
* output_tokens: the number of output tokens. Type: unsigned integer.
* total_tokens: the total number of tokens. Type: unsigned integer.
For example, the following expressions are valid:
*
model == 'llama' ? input_tokens + output_token * 0.5 : total_tokens
*
backend == 'foo.default' ? input_tokens + output_tokens : total_tokens
*
input_tokens + output_tokens + total_tokens
*
input_tokens * output_tokens
LLMRequestCostType
Underlying type: string
Appears in:
LLMRequestCostType specifies the type of the LLMRequestCost.
Possible Values
InputToken
OutputToken
TotalToken
CEL
VersionedAPISchema
Appears in:
VersionedAPISchema defines the API schema of either AIGatewayRoute (the input) or AIServiceBackend (the output).
This allows the ai-gateway to understand the input and perform the necessary transformation depending on the API schema pair (input, output).
Note that this is vendor specific, and the stability of the API schema is not guaranteed by the ai-gateway, but by the vendor via proper versioning.
Fields
version
required