Skip to main content

API Reference

aigateway.envoyproxy.io/v1alpha1

Package v1alpha1 contains API schema definitions for the aigateway.envoyproxy.io API group.

Resource Kinds

Available Kinds

Kind Definitions

AIGatewayRoute

Appears in:

AIGatewayRoute combines multiple AIServiceBackends and attaching them to Gateway(s) resources.

This serves as a way to define a "unified" AI API for a Gateway which allows downstream clients to use a single schema API to interact with multiple AI backends.

The schema field is used to determine the structure of the requests that the Gateway will receive. And then the Gateway will route the traffic to the appropriate AIServiceBackend based on the output schema of the AIServiceBackend while doing the other necessary jobs like upstream authentication, rate limit, etc.

For Advanced Users: Envoy AI Gateway will generate the following k8s resources corresponding to the AIGatewayRoute:

  • Deployment, Service, and ConfigMap of the k8s API for the AI Gateway filter. The name of these resources are ai-eg-route-extproc-${name}.
  • HTTPRoute of the Gateway API as a top-level resource to bind all backends. The name of the HTTPRoute is the same as the AIGatewayRoute.
  • EnvoyExtensionPolicy of the Envoy Gateway API to attach the AI Gateway filter into the HTTPRoute. The name of the EnvoyExtensionPolicy is ai-eg-route-extproc-${name} which is the same as the Deployment, etc.
  • HTTPRouteFilter of the Envoy Gateway API per namespace for automatic hostname rewrite. The name of the HTTPRouteFilter is ai-eg-host-rewrite.

All of these resources are created in the same namespace as the AIGatewayRoute. Note that this is the implementation detail subject to change. If you want to customize the default behavior of the Envoy AI Gateway, you can use these resources as a reference and create your own resources. Alternatively, you can use EnvoyPatchPolicy API of the Envoy Gateway to patch the generated resources. For example, you can insert a custom filter into the filter chain.

Fields
apiVersionrequired
String
We are on version aigateway.envoyproxy.io/v1alpha1 of the API.
kindrequired
String
This is a AIGatewayRoute resource
metadatarequired
Refer to Kubernetes API documentation for fields of metadata.
specrequired
Spec defines the details of the AIGatewayRoute.

AIGatewayRouteList

AIGatewayRouteList contains a list of AIGatewayRoute.

Fields
apiVersionrequired
String
We are on version aigateway.envoyproxy.io/v1alpha1 of the API.
kindrequired
String
This is a AIGatewayRouteList resource
metadatarequired
Refer to Kubernetes API documentation for fields of metadata.
itemsrequired

AIServiceBackend

Appears in:

AIServiceBackend is a resource that represents a single backend for AIGatewayRoute. A backend is a service that handles traffic with a concrete API specification.

A AIServiceBackend is "attached" to a Backend which is either a k8s Service or a Backend resource of the Envoy Gateway.

When a backend with an attached AIServiceBackend is used as a routing target in the AIGatewayRoute (more precisely, the HTTPRouteSpec defined in the AIGatewayRoute), the ai-gateway will generate the necessary configuration to do the backend specific logic in the final HTTPRoute.

Fields
apiVersionrequired
String
We are on version aigateway.envoyproxy.io/v1alpha1 of the API.
kindrequired
String
This is a AIServiceBackend resource
metadatarequired
Refer to Kubernetes API documentation for fields of metadata.
Spec defines the details of AIServiceBackend.

AIServiceBackendList

AIServiceBackendList contains a list of AIServiceBackends.

Fields
apiVersionrequired
String
We are on version aigateway.envoyproxy.io/v1alpha1 of the API.
kindrequired
String
This is a AIServiceBackendList resource
metadatarequired
Refer to Kubernetes API documentation for fields of metadata.
itemsrequired

BackendSecurityPolicy

Appears in:

BackendSecurityPolicy specifies configuration for authentication and authorization rules on the traffic exiting the gateway to the backend.

Fields
apiVersionrequired
String
We are on version aigateway.envoyproxy.io/v1alpha1 of the API.
kindrequired
String
This is a BackendSecurityPolicy resource
metadatarequired
Refer to Kubernetes API documentation for fields of metadata.

BackendSecurityPolicyList

BackendSecurityPolicyList contains a list of BackendSecurityPolicy

Fields
apiVersionrequired
String
We are on version aigateway.envoyproxy.io/v1alpha1 of the API.
kindrequired
String
This is a BackendSecurityPolicyList resource
metadatarequired
Refer to Kubernetes API documentation for fields of metadata.
itemsrequired

Supporting Types

Available Types

Type Definitions

AIGatewayFilterConfig

Appears in:

Fields
ExternalProcessor
Type specifies the type of the filter configuration.
Currently, only ExternalProcessor is supported, and default is ExternalProcessor.
ExternalProcessor is the configuration for the external processor filter.
This is optional, and if not set, the default values of Deployment spec will be used.

AIGatewayFilterConfigExternalProcessor

Appears in:

Fields
replicasoptional
integer
Replicas is the number of desired pods of the external processor deployment.
resourcesoptional
Resources required by the external processor container.
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

AIGatewayFilterConfigType

Underlying type: string

Appears in:

AIGatewayFilterConfigType specifies the type of the filter configuration.

Possible Values
ExternalProcessor
DynamicModule

AIGatewayRouteRule

Appears in:

AIGatewayRouteRule is a rule that defines the routing behavior of the AIGatewayRoute.

Fields
backendRefsoptional
BackendRefs is the list of AIServiceBackend that this rule will route the traffic to.
Each backend can have a weight that determines the traffic distribution.
The namespace of each backend is local, i.e. the same namespace as the AIGatewayRoute.
matchesoptional
Matches is the list of AIGatewayRouteMatch that this rule will match the traffic to.
This is a subset of the HTTPRouteMatch in the Gateway API. See for the details:
https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.HTTPRouteMatch

AIGatewayRouteRuleBackendRef

Appears in:

AIGatewayRouteRuleBackendRef is a reference to a AIServiceBackend with a weight.

Fields
namerequired
string
Name is the name of the AIServiceBackend.
weightoptional
integer
1
Weight is the weight of the AIServiceBackend. This is exactly the same as the weight in
the BackendRef in the Gateway API. See for the details:
https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.BackendRef
Default is 1.

AIGatewayRouteRuleMatch

Appears in:

Fields
headersoptional
HTTPHeaderMatch array
Headers specifies HTTP request header matchers. See HeaderMatch in the Gateway API for the details:
https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.HTTPHeaderMatch
Currently, only the exact header matching is supported.

AIGatewayRouteSpec

Appears in:

AIGatewayRouteSpec details the AIGatewayRoute configuration.

Fields
TargetRefs are the names of the Gateway resources this AIGatewayRoute is being attached to.
schemarequired
APISchema specifies the API schema of the input that the target Gateway(s) will receive.
Based on this schema, the ai-gateway will perform the necessary transformation to the
output schema specified in the selected AIServiceBackend during the routing process.
Currently, the only supported schema is OpenAI as the input schema.
rulesrequired
Rules is the list of AIGatewayRouteRule that this AIGatewayRoute will match the traffic to.
Each rule is a subset of the HTTPRoute in the Gateway API (https://gateway-api.sigs.k8s.io/api-types/httproute/).
AI Gateway controller will generate a HTTPRoute based on the configuration given here with the additional
modifications to achieve the necessary jobs, notably inserting the AI Gateway filter responsible for
the transformation of the request and response, etc.
In the matching conditions in the AIGatewayRouteRule, x-ai-eg-model header is available
if we want to describe the routing behavior based on the model name. The model name is extracted
from the request content before the routing decision.
How multiple rules are matched is the same as the Gateway API. See for the details:
https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.HTTPRoute
filterConfigrequired
FilterConfig is the configuration for the AI Gateway filter inserted in the generated HTTPRoute.
An AI Gateway filter is responsible for the transformation of the request and response
as well as the routing behavior based on the model name extracted from the request content, etc.
Currently, the filter is only implemented as an external processor filter, which might be
extended to other types of filters in the future. See https://github.com/envoyproxy/ai-gateway/issues/90
llmRequestCostsoptional
LLMRequestCosts specifies how to capture the cost of the LLM-related request, notably the token usage.
The AI Gateway filter will capture each specified number and store it in the Envoy's dynamic
metadata per HTTP request. The namespaced key is io.envoy.ai_gateway,
For example, let's say we have the following LLMRequestCosts configuration:

llmRequestCosts:
- metadataKey: llm_input_token
type: InputToken
- metadataKey: llm_output_token
type: OutputToken
- metadataKey: llm_total_token
type: TotalToken

Then, with the following BackendTrafficPolicy of Envoy Gateway, you can have three
rate limit buckets for each unique x-user-id header value. One bucket is for the input token,
the other is for the output token, and the last one is for the total token.
Each bucket will be reduced by the corresponding token usage captured by the AI Gateway filter.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: some-example-token-rate-limit
namespace: default
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: usage-rate-limit
rateLimit:
type: Global
global:
rules:
- clientSelectors:
# Do the rate limiting based on the x-user-id header.
- headers:
- name: x-user-id
type: Distinct
limit:
# Configures the number of tokens allowed per hour.
requests: 10000
unit: Hour
cost:
request:
from: Number
# Setting the request cost to zero allows to only check the rate limit budget,
# and not consume the budget on the request path.
number: 0
# This specifies the cost of the response retrieved from the dynamic metadata set by the AI Gateway filter.
# The extracted value will be used to consume the rate limit budget, and subsequent requests will be rate limited
# if the budget is exhausted.
response:
from: Metadata
metadata:
namespace: io.envoy.ai_gateway
key: llm_input_token
- clientSelectors:
- headers:
- name: x-user-id
type: Distinct
limit:
requests: 10000
unit: Hour
cost:
request:
from: Number
number: 0
response:
from: Metadata
metadata:
namespace: io.envoy.ai_gateway
key: llm_output_token
- clientSelectors:
- headers:
- name: x-user-id
type: Distinct
limit:
requests: 10000
unit: Hour
cost:
request:
from: Number
number: 0
response:
from: Metadata
metadata:
namespace: io.envoy.ai_gateway
key: llm_total_token

AIServiceBackendSpec

Appears in:

AIServiceBackendSpec details the AIServiceBackend configuration.

Fields
schemarequired
APISchema specifies the API schema of the output format of requests from
Envoy that this AIServiceBackend can accept as incoming requests.
Based on this schema, the ai-gateway will perform the necessary transformation for
the pair of AIGatewayRouteSpec.APISchema and AIServiceBackendSpec.APISchema.
This is required to be set.
backendRefrequired
BackendRef is the reference to the Backend resource that this AIServiceBackend corresponds to.
A backend can be of either k8s Service or Backend resource of Envoy Gateway.
This is required to be set.
backendSecurityPolicyRefoptional
BackendSecurityPolicyRef is the name of the BackendSecurityPolicy resources this backend
is being attached to.

APISchema

Underlying type: string

Appears in:

APISchema defines the API schema.

Possible Values
OpenAI
APISchemaOpenAI is the OpenAI schema.
https://github.com/openai/openai-openapi
AWSBedrock
APISchemaAWSBedrock is the AWS Bedrock schema.
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock_Runtime.html

AWSCredentialsFile

Appears in:

AWSCredentialsFile specifies the credentials file to use for the AWS provider. Envoy reads the secret file, and the profile to use is specified by the Profile field.

Fields
secretRefrequired
SecretRef is the reference to the credential file.
The secret should contain the AWS credentials file keyed on credentials.
profilerequired
string
default
Profile is the profile to use in the credentials file.

AWSOIDCExchangeToken

Appears in:

AWSOIDCExchangeToken specifies credentials to obtain oidc token from a sso server. For AWS, the controller will query STS to obtain AWS AccessKeyId, SecretAccessKey, and SessionToken, and store them in a temporary credentials file.

Fields
oidcrequired
OIDC is used to obtain oidc tokens via an SSO server which will be used to exchange for temporary AWS credentials.
grantTypeoptional
string
GrantType is the method application gets access token.
audoptional
string
Aud defines the audience that this ID Token is intended for.
awsRoleArnrequired
string
AwsRoleArn is the AWS IAM Role with the permission to use specific resources in AWS account
which maps to the temporary AWS security credentials exchanged using the authentication token issued by OIDC provider.

BackendSecurityPolicyAPIKey

Appears in:

BackendSecurityPolicyAPIKey specifies the API key.

Fields
secretRefrequired
SecretRef is the reference to the secret containing the API key.
ai-gateway must be given the permission to read this secret.
The key of the secret should be apiKey.

BackendSecurityPolicyAWSCredentials

Appears in:

BackendSecurityPolicyAWSCredentials contains the supported authentication mechanisms to access aws

Fields
regionrequired
string
Region specifies the AWS region associated with the policy.
credentialsFileoptional
CredentialsFile specifies the credentials file to use for the AWS provider.
oidcExchangeTokenoptional
OIDCExchangeToken specifies the oidc configurations used to obtain an oidc token. The oidc token will be
used to obtain temporary credentials to access AWS.

BackendSecurityPolicySpec

Appears in:

BackendSecurityPolicySpec specifies authentication rules on access the provider from the Gateway. Only one mechanism to access a backend(s) can be specified.

Only one type of BackendSecurityPolicy can be defined.

Fields
Type specifies the auth mechanism used to access the provider. Currently, only APIKey, AND AWSCredentials are supported.
APIKey is a mechanism to access a backend(s). The API key will be injected into the Authorization header.
AWSCredentials is a mechanism to access a backend(s). AWS specific logic will be applied.

BackendSecurityPolicyType

Underlying type: string

Appears in:

BackendSecurityPolicyType specifies the type of auth mechanism used to access a backend.

Possible Values
APIKey
AWSCredentials

LLMRequestCost

Appears in:

LLMRequestCost configures each request cost.

Fields
metadataKeyrequired
string
MetadataKey is the key of the metadata to store this cost of the request.
typerequired
Type specifies the type of the request cost. The default is OutputToken,
and it uses output token as the cost. The other types are InputToken, TotalToken,
and CEL.
celoptional
string
CEL is the CEL expression to calculate the cost of the request.
The CEL expression must return a signed or unsigned integer. If the
return value is negative, it will be error.
The expression can use the following variables:
* model: the model name extracted from the request content. Type: string.
* backend: the backend name in the form of name.namespace. Type: string.
* input_tokens: the number of input tokens. Type: unsigned integer.
* output_tokens: the number of output tokens. Type: unsigned integer.
* total_tokens: the total number of tokens. Type: unsigned integer.
For example, the following expressions are valid:
* model == 'llama' ? input_tokens + output_token * 0.5 : total_tokens
* backend == 'foo.default' ? input_tokens + output_tokens : total_tokens
* input_tokens + output_tokens + total_tokens
* input_tokens * output_tokens

LLMRequestCostType

Underlying type: string

Appears in:

LLMRequestCostType specifies the type of the LLMRequestCost.

Possible Values
InputToken
LLMRequestCostTypeInputToken is the cost type of the input token.
OutputToken
LLMRequestCostTypeOutputToken is the cost type of the output token.
TotalToken
LLMRequestCostTypeTotalToken is the cost type of the total token.
CEL
LLMRequestCostTypeCEL is for calculating the cost using the CEL expression.

VersionedAPISchema

Appears in:

VersionedAPISchema defines the API schema of either AIGatewayRoute (the input) or AIServiceBackend (the output).

This allows the ai-gateway to understand the input and perform the necessary transformation depending on the API schema pair (input, output).

Note that this is vendor specific, and the stability of the API schema is not guaranteed by the ai-gateway, but by the vendor via proper versioning.

Fields
namerequired
Name is the name of the API schema of the AIGatewayRoute or AIServiceBackend.
versionrequired
string
Version is the version of the API schema.