Version: latest

API Reference

aigateway.envoyproxy.io/v1alpha1

Package v1alpha1 contains API schema definitions for the aigateway.envoyproxy.io API group.

Resource Kinds

Available Kinds

AIGatewayRoute
AIGatewayRouteList
AIServiceBackend
AIServiceBackendList
BackendSecurityPolicy
BackendSecurityPolicyList

Kind Definitions

AIGatewayRoute

Appears in:

AIGatewayRouteList

AIGatewayRoute combines multiple AIServiceBackends and attaching them to Gateway(s) resources.

This serves as a way to define a "unified" AI API for a Gateway which allows downstream clients to use a single schema API to interact with multiple AI backends.

The schema field is used to determine the structure of the requests that the Gateway will receive. And then the Gateway will route the traffic to the appropriate AIServiceBackend based on the output schema of the AIServiceBackend while doing the other necessary jobs like upstream authentication, rate limit, etc.

Envoy AI Gateway will generate the following k8s resources corresponding to the AIGatewayRoute:

HTTPRoute of the Gateway API as a top-level resource to bind all backends. The name of the HTTPRoute is the same as the AIGatewayRoute.
EnvoyExtensionPolicy of the Envoy Gateway API to attach the AI Gateway filter into the target Gateways. This will be created per Gateway, and its name is ai-eg-eep-${gateway-name}.
HTTPRouteFilter of the Envoy Gateway API per namespace for automatic hostname rewrite. The name of the HTTPRouteFilter is ai-eg-host-rewrite.

All of these resources are created in the same namespace as the AIGatewayRoute. Note that this is the implementation detail subject to change. If you want to customize the default behavior of the Envoy AI Gateway, you can use these resources as a reference and create your own resources. Alternatively, you can use EnvoyPatchPolicy API of the Envoy Gateway to patch the generated resources. For example, you can configure the retry fallback behavior by attaching BackendTrafficPolicy API of Envoy Gateway to the generated HTTPRoute.

Fields

apiVersionrequired

String

We are on version aigateway.envoyproxy.io/v1alpha1 of the API.

kindrequired

String

This is a AIGatewayRoute resource

metadatarequired

ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

specrequired

AIGatewayRouteSpec

Spec defines the details of the AIGatewayRoute.

statusrequired

AIGatewayRouteStatus

Status defines the status details of the AIGatewayRoute.

AIGatewayRouteList

AIGatewayRouteList contains a list of AIGatewayRoute.

Fields

apiVersionrequired

String

We are on version aigateway.envoyproxy.io/v1alpha1 of the API.

kindrequired

String

This is a AIGatewayRouteList resource

metadatarequired

ListMeta

Refer to Kubernetes API documentation for fields of metadata.

itemsrequired

AIGatewayRoute array

AIServiceBackend

Appears in:

AIServiceBackendList

AIServiceBackend is a resource that represents a single backend for AIGatewayRoute. A backend is a service that handles traffic with a concrete API specification.

A AIServiceBackend is "attached" to a Backend which is either a k8s Service or a Backend resource of the Envoy Gateway.

When a backend with an attached AIServiceBackend is used as a routing target in the AIGatewayRoute (more precisely, the HTTPRouteSpec defined in the AIGatewayRoute), the ai-gateway will generate the necessary configuration to do the backend specific logic in the final HTTPRoute.

Fields

apiVersionrequired

String

We are on version aigateway.envoyproxy.io/v1alpha1 of the API.

kindrequired

String

This is a AIServiceBackend resource

metadatarequired

ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

specrequired

AIServiceBackendSpec

Spec defines the details of AIServiceBackend.

statusrequired

AIServiceBackendStatus

Status defines the status details of the AIServiceBackend.

AIServiceBackendList

AIServiceBackendList contains a list of AIServiceBackends.

Fields

apiVersionrequired

String

We are on version aigateway.envoyproxy.io/v1alpha1 of the API.

kindrequired

String

This is a AIServiceBackendList resource

metadatarequired

ListMeta

Refer to Kubernetes API documentation for fields of metadata.

itemsrequired

AIServiceBackend array

BackendSecurityPolicy

Appears in:

BackendSecurityPolicyList

BackendSecurityPolicy specifies configuration for authentication and authorization rules on the traffic exiting the gateway to the backend.

Fields

apiVersionrequired

String

We are on version aigateway.envoyproxy.io/v1alpha1 of the API.

kindrequired

String

This is a BackendSecurityPolicy resource

metadatarequired

ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

specrequired

BackendSecurityPolicySpec

statusrequired

BackendSecurityPolicyStatus

Status defines the status details of the BackendSecurityPolicy.

BackendSecurityPolicyList

BackendSecurityPolicyList contains a list of BackendSecurityPolicy

Fields

apiVersionrequired

String

We are on version aigateway.envoyproxy.io/v1alpha1 of the API.

kindrequired

String

This is a BackendSecurityPolicyList resource

metadatarequired

ListMeta

Refer to Kubernetes API documentation for fields of metadata.

itemsrequired

BackendSecurityPolicy array

Supporting Types

Available Types

AIGatewayFilterConfig
AIGatewayFilterConfigExternalProcessor
AIGatewayFilterConfigType
AIGatewayRouteRule
AIGatewayRouteRuleBackendRef
AIGatewayRouteRuleMatch
AIGatewayRouteSpec
AIGatewayRouteStatus
AIServiceBackendSpec
AIServiceBackendStatus
APISchema
AWSCredentialsFile
AWSOIDCExchangeToken
AzureOIDCExchangeToken
BackendSecurityPolicyAPIKey
BackendSecurityPolicyAWSCredentials
BackendSecurityPolicyAzureCredentials
BackendSecurityPolicyGCPCredentials
BackendSecurityPolicyOIDC
BackendSecurityPolicySpec
BackendSecurityPolicyStatus
BackendSecurityPolicyType
GCPServiceAccountImpersonationConfig
GCPWorkLoadIdentityFederationConfig
GCPWorkloadIdentityProvider
LLMRequestCost
LLMRequestCostType
VersionedAPISchema

Type Definitions

AIGatewayFilterConfig

Appears in:

AIGatewayRouteSpec

Fields

typerequired

AIGatewayFilterConfigType

ExternalProcessor

Type specifies the type of the filter configuration.
Currently, only ExternalProcessor is supported, and default is ExternalProcessor.

externalProcessoroptional

AIGatewayFilterConfigExternalProcessor

ExternalProcessor is the configuration for the external processor filter.
This is optional, and if not set, the default values of Deployment spec will be used.

AIGatewayFilterConfigExternalProcessor

Appears in:

AIGatewayFilterConfig

Fields

resourcesoptional

ResourceRequirements

Resources required by the external processor container.
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
Note: when multiple AIGatewayRoute resources are attached to the same Gateway, and each
AIGatewayRoute has a different resource configuration, the ai-gateway will pick one of them
to configure the resource requirements of the external processor container.

AIGatewayFilterConfigType

Underlying type: string

Appears in:

AIGatewayFilterConfig

AIGatewayFilterConfigType specifies the type of the filter configuration.

Possible Values

ExternalProcessor

DynamicModule

AIGatewayRouteRule

Appears in:

AIGatewayRouteSpec

AIGatewayRouteRule is a rule that defines the routing behavior of the AIGatewayRoute.

Fields

backendRefsoptional

AIGatewayRouteRuleBackendRef array

BackendRefs is the list of backends that this rule will route the traffic to.
Each backend can have a weight that determines the traffic distribution.
The namespace of each backend is local, i.e. the same namespace as the AIGatewayRoute.
BackendRefs can reference either AIServiceBackend resources (default) or InferencePool resources
from the Gateway API Inference Extension. When referencing InferencePool resources:
- Only one InferencePool backend is allowed per rule
- Cannot mix InferencePool with AIServiceBackend references in the same rule
- Fallback behavior is handled by the InferencePool's endpoint picker
For AIServiceBackend references, you can achieve fallback behavior by configuring multiple backends
combined with the BackendTrafficPolicy of Envoy Gateway.
Please refer to https://gateway.envoyproxy.io/docs/tasks/traffic/failover/ as well as
https://gateway.envoyproxy.io/docs/tasks/traffic/retry/.

matchesoptional

AIGatewayRouteRuleMatch array

Matches is the list of AIGatewayRouteMatch that this rule will match the traffic to.
This is a subset of the HTTPRouteMatch in the Gateway API. See for the details:
https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.HTTPRouteMatch

timeoutsoptional

HTTPRouteTimeouts

Timeouts defines the timeouts that can be configured for an HTTP request.
If this field is not set, or the timeout.requestTimeout is nil, Envoy AI Gateway defaults to
set 60s for the request timeout as opposed to 15s of the Envoy Gateway's default value.
For streaming responses (like chat completions with stream=true), consider setting
longer timeouts as the response may take time until the completion.

modelsOwnedByoptional

string

Envoy AI Gateway

ModelsOwnedBy represents the owner of the running models serving by the backends,
which will be exported as the field of OwnedBy in openai-compatible API /models.
This is used only when this rule contains x-ai-eg-model in its header matching
where the header value will be recognized as a model in /models endpoint.
All the matched models will share the same owner.
Default to Envoy AI Gateway if not set.

modelsCreatedAtoptional

Time

ModelsCreatedAt represents the creation timestamp of the running models serving by the backends,
which will be exported as the field of Created in openai-compatible API /models.
It follows the format of RFC 3339, for example 2024-05-21T10:00:00Z.
This is used only when this rule contains x-ai-eg-model in its header matching
where the header value will be recognized as a model in /models endpoint.
All the matched models will share the same creation time.
Default to the creation timestamp of the AIGatewayRoute if not set.

AIGatewayRouteRuleBackendRef

Appears in:

AIGatewayRouteRule

AIGatewayRouteRuleBackendRef is a reference to a backend with a weight. It can reference either an AIServiceBackend or an InferencePool resource.

Fields

namerequired

string

Name is the name of the backend resource.
When Group and Kind are not specified, this refers to an AIServiceBackend.
When Group and Kind are specified, this refers to the resource of the specified type.

groupoptional

string

Group is the group of the backend resource.
When not specified, defaults to aigateway.envoyproxy.io (AIServiceBackend).
Currently, only inference.networking.x-k8s.io is supported for InferencePool resources.

kindoptional

string

Kind is the kind of the backend resource.
When not specified, defaults to AIServiceBackend.
Currently, only InferencePool is supported when Group is specified.

modelNameOverriderequired

string

Name of the model in the backend. If provided this will override the name provided in the request.
This field is ignored when referencing InferencePool resources.

weightoptional

integer

Weight is the weight of the backend. This is exactly the same as the weight in
the BackendRef in the Gateway API. See for the details:
https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.BackendRef
Default is 1.

priorityoptional

integer

Priority is the priority of the backend. This sets the priority on the underlying endpoints.
See: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/priority
Note: This will override the faillback property of the underlying Envoy Gateway Backend
This field is ignored when referencing InferencePool resources.
Default is 0.

AIGatewayRouteRuleMatch

Appears in:

AIGatewayRouteRule

Fields

headersoptional

HTTPHeaderMatch array

Headers specifies HTTP request header matchers. See HeaderMatch in the Gateway API for the details:
https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.HTTPHeaderMatch

AIGatewayRouteSpec

Appears in:

AIGatewayRoute

AIGatewayRouteSpec details the AIGatewayRoute configuration.

Fields

targetRefsrequired

LocalPolicyTargetReferenceWithSectionName array

TargetRefs are the names of the Gateway resources this AIGatewayRoute is being attached to.
Deprecated: use the ParentRefs field instead. This field will be dropped in Envoy AI Gateway v0.4.0.

parentRefsrequired

ParentReference array

ParentRefs are the names of the Gateway resources this AIGatewayRoute is being attached to.
Cross namespace references are not supported. In other words, the Gateway resources must be in the
same namespace as the AIGatewayRoute. Currently, each reference's Kind must be Gateway.

schemarequired

VersionedAPISchema

APISchema specifies the API schema of the input that the target Gateway(s) will receive.
Based on this schema, the ai-gateway will perform the necessary transformation to the
output schema specified in the selected AIServiceBackend during the routing process.
Currently, the only supported schema is OpenAI as the input schema.

rulesrequired

AIGatewayRouteRule array

Rules is the list of AIGatewayRouteRule that this AIGatewayRoute will match the traffic to.
Each rule is a subset of the HTTPRoute in the Gateway API (https://gateway-api.sigs.k8s.io/api-types/httproute/).
AI Gateway controller will generate a HTTPRoute based on the configuration given here with the additional
modifications to achieve the necessary jobs, notably inserting the AI Gateway filter responsible for
the transformation of the request and response, etc.
In the matching conditions in the AIGatewayRouteRule, x-ai-eg-model header is available
if we want to describe the routing behavior based on the model name. The model name is extracted
from the request content before the routing decision.
How multiple rules are matched is the same as the Gateway API. See for the details:
https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io%2fv1.HTTPRoute

filterConfigrequired

AIGatewayFilterConfig

FilterConfig is the configuration for the AI Gateway filter inserted in the generated HTTPRoute.
An AI Gateway filter is responsible for the transformation of the request and response
as well as the routing behavior based on the model name extracted from the request content, etc.
Currently, the filter is only implemented as an external processor filter, which might be
extended to other types of filters in the future. See https://github.com/envoyproxy/ai-gateway/issues/90

llmRequestCostsoptional

LLMRequestCost array

LLMRequestCosts specifies how to capture the cost of the LLM-related request, notably the token usage.
The AI Gateway filter will capture each specified number and store it in the Envoy's dynamic
metadata per HTTP request. The namespaced key is io.envoy.ai_gateway,
For example, let's say we have the following LLMRequestCosts configuration:


	llmRequestCosts:
	- metadataKey: llm_input_token
	  type: InputToken
	- metadataKey: llm_output_token
	  type: OutputToken
	- metadataKey: llm_total_token
	  type: TotalToken

Then, with the following BackendTrafficPolicy of Envoy Gateway, you can have three
rate limit buckets for each unique x-user-id header value. One bucket is for the input token,
the other is for the output token, and the last one is for the total token.
Each bucket will be reduced by the corresponding token usage captured by the AI Gateway filter.


	apiVersion: gateway.envoyproxy.io/v1alpha1
	kind: BackendTrafficPolicy
	metadata:
	  name: some-example-token-rate-limit
	  namespace: default
	spec:
	  targetRefs:
	  - group: gateway.networking.k8s.io
	     kind: HTTPRoute
	     name: usage-rate-limit
	  rateLimit:
	    type: Global
	    global:
	      rules:
	        - clientSelectors:
	            # Do the rate limiting based on the x-user-id header.
	            - headers:
	                - name: x-user-id
	                  type: Distinct
	          limit:
	            # Configures the number of tokens allowed per hour.
	            requests: 10000
	            unit: Hour
	          cost:
	            request:
	              from: Number
	              # Setting the request cost to zero allows to only check the rate limit budget,
	              # and not consume the budget on the request path.
	              number: 0
	            # This specifies the cost of the response retrieved from the dynamic metadata set by the AI Gateway filter.
	            # The extracted value will be used to consume the rate limit budget, and subsequent requests will be rate limited
	            # if the budget is exhausted.
	            response:
	              from: Metadata
	              metadata:
	                namespace: io.envoy.ai_gateway
	                key: llm_input_token
	        - clientSelectors:
	            - headers:
	                - name: x-user-id
	                  type: Distinct
	          limit:
	            requests: 10000
	            unit: Hour
	          cost:
	            request:
	              from: Number
	              number: 0
	            response:
	              from: Metadata
	              metadata:
	                namespace: io.envoy.ai_gateway
	                key: llm_output_token
	        - clientSelectors:
	            - headers:
	                - name: x-user-id
	                  type: Distinct
	          limit:
	            requests: 10000
	            unit: Hour
	          cost:
	            request:
	              from: Number
	              number: 0
	            response:
	              from: Metadata
	              metadata:
	                namespace: io.envoy.ai_gateway
	                key: llm_total_token

Note that when multiple AIGatewayRoute resources are attached to the same Gateway, and
different costs are configured for the same metadata key, the ai-gateway will pick one of them
to configure the metadata key in the generated HTTPRoute, and ignore the rest.

AIGatewayRouteStatus

Appears in:

AIGatewayRoute

AIGatewayRouteStatus contains the conditions by the reconciliation result.

Fields

conditionsrequired

Condition array

Conditions is the list of conditions by the reconciliation result.
Currently, at most one condition is set.
Known .status.conditions.type are: Accepted, NotAccepted.

AIServiceBackendSpec

Appears in:

AIServiceBackend

AIServiceBackendSpec details the AIServiceBackend configuration.

Fields

schemarequired

VersionedAPISchema

APISchema specifies the API schema of the output format of requests from
Envoy that this AIServiceBackend can accept as incoming requests.
Based on this schema, the ai-gateway will perform the necessary transformation for
the pair of AIGatewayRouteSpec.APISchema and AIServiceBackendSpec.APISchema.
This is required to be set.

backendRefrequired

BackendObjectReference

BackendRef is the reference to the Backend resource that this AIServiceBackend corresponds to.
A backend must be a Backend resource of Envoy Gateway. Note that k8s Service will be supported
as a backend in the future.
This is required to be set.

backendSecurityPolicyRefoptional

LocalObjectReference

BackendSecurityPolicyRef is the name of the BackendSecurityPolicy resources this backend
is being attached to.

AIServiceBackendStatus

Appears in:

AIServiceBackend

AIServiceBackendStatus contains the conditions by the reconciliation result.

Fields

conditionsrequired

Condition array

Conditions is the list of conditions by the reconciliation result.
Currently, at most one condition is set.
Known .status.conditions.type are: Accepted, NotAccepted.

APISchema

Underlying type: string

Appears in:

VersionedAPISchema

APISchema defines the API schema.

Possible Values

OpenAI

APISchemaOpenAI is the OpenAI schema.
https://github.com/openai/openai-openapi

AWSBedrock

APISchemaAWSBedrock is the AWS Bedrock schema.
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock_Runtime.html

AzureOpenAI

APISchemaAzureOpenAI APISchemaAzure is the Azure OpenAI schema.
https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#api-specs

GCPVertexAI

APISchemaGCPVertexAI is the schema followed by Gemini models hosted on GCP's Vertex AI platform.
Note: Using this schema requires a BackendSecurityPolicy to be configured and attached,
as the transformation will use the gcp-region and project-name from the BackendSecurityPolicy.
https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/generateContent?hl=en

GCPAnthropic

APISchemaGCPAnthropic is the schema followed by Anthropic models hosted on GCP's Vertex AI platform.
This is majorly the Anthropic API with some GCP specific parameters as described in below URL.
https://docs.anthropic.com/en/api/claude-on-vertex-ai

AWSCredentialsFile

Appears in:

BackendSecurityPolicyAWSCredentials

AWSCredentialsFile specifies the credentials file to use for the AWS provider. Envoy reads the secret file, and the profile to use is specified by the Profile field.

Fields

secretRefrequired

SecretObjectReference

SecretRef is the reference to the credential file.
The secret should contain the AWS credentials file keyed on credentials.

profilerequired

string

default

Profile is the profile to use in the credentials file.

AWSOIDCExchangeToken

Appears in:

BackendSecurityPolicyAWSCredentials

AWSOIDCExchangeToken specifies credentials to obtain oidc token from a sso server. For AWS, the controller will query STS to obtain AWS AccessKeyId, SecretAccessKey, and SessionToken, and store them in a temporary credentials file.

Fields

oidcrequired

OIDC

OIDC is used to obtain oidc tokens via an SSO server which will be used to exchange for provider credentials.

grantTypeoptional

string

GrantType is the method application gets access token.

audoptional

string

Aud defines the audience that this ID Token is intended for.

awsRoleArnrequired

string

AwsRoleArn is the AWS IAM Role with the permission to use specific resources in AWS account
which maps to the temporary AWS security credentials exchanged using the authentication token issued by OIDC provider.

AzureOIDCExchangeToken

Appears in:

BackendSecurityPolicyAzureCredentials

AzureOIDCExchangeToken specifies credentials to obtain oidc token from a sso server. For Azure, the controller will query Azure Entra ID to get an Azure Access Token, and store them in a secret.

Fields

oidcrequired

OIDC

OIDC is used to obtain oidc tokens via an SSO server which will be used to exchange for provider credentials.

grantTypeoptional

string

GrantType is the method application gets access token.

audoptional

string

Aud defines the audience that this ID Token is intended for.

BackendSecurityPolicyAPIKey

Appears in:

BackendSecurityPolicySpec

BackendSecurityPolicyAPIKey specifies the API key.

Fields

secretRefrequired

SecretObjectReference

SecretRef is the reference to the secret containing the API key.
ai-gateway must be given the permission to read this secret.
The key of the secret should be apiKey.

BackendSecurityPolicyAWSCredentials

Appears in:

BackendSecurityPolicySpec

BackendSecurityPolicyAWSCredentials contains the supported authentication mechanisms to access aws.

Fields

regionrequired

string

Region specifies the AWS region associated with the policy.

credentialsFileoptional

AWSCredentialsFile

CredentialsFile specifies the credentials file to use for the AWS provider.

oidcExchangeTokenoptional

AWSOIDCExchangeToken

OIDCExchangeToken specifies the oidc configurations used to obtain an oidc token. The oidc token will be
used to obtain temporary credentials to access AWS.

BackendSecurityPolicyAzureCredentials

Appears in:

BackendSecurityPolicySpec

BackendSecurityPolicyAzureCredentials contains the supported authentication mechanisms to access Azure. Only one of ClientSecretRef or OIDCExchangeToken must be specified. Credentials will not be generated if neither are set.

Fields

clientIDrequired

string

ClientID is a unique identifier for an application in Azure.

tenantIDrequired

string

TenantId is a unique identifier for an Azure Active Directory instance.

clientSecretRefoptional

SecretObjectReference

ClientSecretRef is the reference to the secret containing the Azure client secret.
ai-gateway must be given the permission to read this secret.
The key of secret should be client-secret.

oidcExchangeTokenoptional

AzureOIDCExchangeToken

OIDCExchangeToken specifies the oidc configurations used to obtain an oidc token. The oidc token will be
used to obtain temporary credentials to access Azure.

BackendSecurityPolicyGCPCredentials

Appears in:

BackendSecurityPolicySpec

BackendSecurityPolicyGCPCredentials contains the supported authentication mechanisms to access GCP.

Fields

projectNamerequired

string

ProjectName is the GCP project name.

regionrequired

string

Region is the GCP region associated with the policy.

workLoadIdentityFederationConfigrequired

GCPWorkLoadIdentityFederationConfig

WorkLoadIdentityFederationConfig is the configuration for the GCP Workload Identity Federation.

BackendSecurityPolicyOIDC

Appears in:

AWSOIDCExchangeToken
AzureOIDCExchangeToken
GCPWorkloadIdentityProvider

BackendSecurityPolicyOIDC specifies OIDC related fields.

Fields

oidcrequired

OIDC

OIDC is used to obtain oidc tokens via an SSO server which will be used to exchange for provider credentials.

grantTypeoptional

string

GrantType is the method application gets access token.

audoptional

string

Aud defines the audience that this ID Token is intended for.

BackendSecurityPolicySpec

Appears in:

BackendSecurityPolicy

BackendSecurityPolicySpec specifies authentication rules on access the provider from the Gateway. Only one mechanism to access a backend(s) can be specified.

Only one type of BackendSecurityPolicy can be defined.

Fields

typerequired

BackendSecurityPolicyType

Type specifies the type of the backend security policy.

apiKeyoptional

BackendSecurityPolicyAPIKey

APIKey is a mechanism to access a backend(s). The API key will be injected into the Authorization header.

awsCredentialsoptional

BackendSecurityPolicyAWSCredentials

AWSCredentials is a mechanism to access a backend(s). AWS specific logic will be applied.

azureCredentialsoptional

BackendSecurityPolicyAzureCredentials

AzureCredentials is a mechanism to access a backend(s). Azure OpenAI specific logic will be applied.

gcpCredentialsoptional

BackendSecurityPolicyGCPCredentials

GCPCredentials is a mechanism to access a backend(s). GCP specific logic will be applied.

BackendSecurityPolicyStatus

Appears in:

BackendSecurityPolicy

BackendSecurityPolicyStatus contains the conditions by the reconciliation result.

Fields

conditionsrequired

Condition array

Conditions is the list of conditions by the reconciliation result.
Currently, at most one condition is set.
Known .status.conditions.type are: Accepted, NotAccepted.

BackendSecurityPolicyType

Underlying type: string

Appears in:

BackendSecurityPolicySpec

BackendSecurityPolicyType specifies the type of auth mechanism used to access a backend.

Possible Values

APIKey

AWSCredentials

AzureCredentials

GCPCredentials

GCPServiceAccountImpersonationConfig

Appears in:

GCPWorkLoadIdentityFederationConfig

Fields

serviceAccountNamerequired

string

ServiceAccountName is the name of the service account to impersonate.

serviceAccountProjectNamerequired

string

ServiceAccountProjectName is the project name in which the service account is registered.

GCPWorkLoadIdentityFederationConfig

Appears in:

BackendSecurityPolicyGCPCredentials

Fields

projectIDrequired

string

ProjectID is the GCP project ID.

workloadIdentityProviderrequired

GCPWorkloadIdentityProvider

WorkloadIdentityProvider is the external auth provider to be used to authenticate against GCP.
https://cloud.google.com/iam/docs/workload-identity-federation?hl=en
Currently only OIDC is supported.

workloadIdentityPoolNamerequired

string

WorkloadIdentityPoolName is the name of the workload identity pool defined in GCP.
https://cloud.google.com/iam/docs/workload-identity-federation?hl=en

serviceAccountImpersonationoptional

GCPServiceAccountImpersonationConfig

ServiceAccountImpersonation is the service account impersonation configuration.
This is used to impersonate a service account when getting access token.

GCPWorkloadIdentityProvider

Appears in:

GCPWorkLoadIdentityFederationConfig

GCPWorkloadIdentityProvider specifies the external identity provider to be used to authenticate against GCP. The external identity provider can be AWS, Microsoft, etc but must be pre-registered in the GCP project

https://cloud.google.com/iam/docs/workload-identity-federation

Fields

namerequired

string

Name of the external identity provider as registered on Google Cloud Platform.

OIDCProviderrequired

BackendSecurityPolicyOIDC

OIDCProvider is the generic OIDCProvider fields.

LLMRequestCost

Appears in:

AIGatewayRouteSpec

LLMRequestCost configures each request cost.

Fields

metadataKeyrequired

string

MetadataKey is the key of the metadata to store this cost of the request.

typerequired

LLMRequestCostType

Type specifies the type of the request cost. The default is OutputToken,
and it uses output token as the cost. The other types are InputToken, TotalToken,
and CEL.

celoptional

string

CEL is the CEL expression to calculate the cost of the request.
The CEL expression must return a signed or unsigned integer. If the
return value is negative, it will be error.
The expression can use the following variables:
* model: the model name extracted from the request content. Type: string.
* backend: the backend name in the form of name.namespace. Type: string.
* input_tokens: the number of input tokens. Type: unsigned integer.
* output_tokens: the number of output tokens. Type: unsigned integer.
* total_tokens: the total number of tokens. Type: unsigned integer.
For example, the following expressions are valid:
* model == 'llama' ? input_tokens + output_token * 0.5 : total_tokens
* backend == 'foo.default' ? input_tokens + output_tokens : total_tokens
* input_tokens + output_tokens + total_tokens
* input_tokens * output_tokens

LLMRequestCostType

Underlying type: string

Appears in:

LLMRequestCost

LLMRequestCostType specifies the type of the LLMRequestCost.

Possible Values

InputToken

LLMRequestCostTypeInputToken is the cost type of the input token.

OutputToken

LLMRequestCostTypeOutputToken is the cost type of the output token.

TotalToken

LLMRequestCostTypeTotalToken is the cost type of the total token.

CEL

LLMRequestCostTypeCEL is for calculating the cost using the CEL expression.

VersionedAPISchema

Appears in:

AIGatewayRouteSpec
AIServiceBackendSpec

VersionedAPISchema defines the API schema of either AIGatewayRoute (the input) or AIServiceBackend (the output).

This allows the ai-gateway to understand the input and perform the necessary transformation depending on the API schema pair (input, output).

Note that this is vendor specific, and the stability of the API schema is not guaranteed by the ai-gateway, but by the vendor via proper versioning.

Fields

namerequired

APISchema

Name is the name of the API schema of the AIGatewayRoute or AIServiceBackend.

versionrequired

string

Version is the version of the API schema.
When the name is set to OpenAI, this equals to the prefix of the OpenAI API endpoints. This defaults to v1
if not set or empty string. For example, chat completions API endpoint will be /v1/chat/completions
if the version is set to v1.
This is especially useful when routing to the backend that has an OpenAI compatible API but has a different
versioning scheme. For example, Gemini OpenAI compatible API (https://ai.google.dev/gemini-api/docs/openai) uses
/v1beta/openai version prefix. Another example is that Cohere AI (https://docs.cohere.com/v2/docs/compatibility-api)
uses /compatibility/v1 version prefix. On the other hand, DeepSeek (https://api-docs.deepseek.com/) doesn't
use version prefix, so the version can be set to an empty string.
When the name is set to AzureOpenAI, this version maps to API Version in the
Azure OpenAI API documentation (https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#rest-api-versioning).

aigateway.envoyproxy.io/v1alpha1​

Resource Kinds​

Available Kinds​

Kind Definitions​

AIGatewayRoute​

Fields​

AIGatewayRouteList​

Fields​

AIServiceBackend​

Fields​

AIServiceBackendList​

Fields​

BackendSecurityPolicy​

Fields​

BackendSecurityPolicyList​

Fields​

Supporting Types​

Available Types​

Type Definitions​

AIGatewayFilterConfig​

Fields​

AIGatewayFilterConfigExternalProcessor​

Fields​

AIGatewayFilterConfigType​

Possible Values​

AIGatewayRouteRule​

Fields​

AIGatewayRouteRuleBackendRef​

Fields​

AIGatewayRouteRuleMatch​

Fields​

AIGatewayRouteSpec​

Fields​

AIGatewayRouteStatus​

Fields​

AIServiceBackendSpec​

Fields​

AIServiceBackendStatus​

Fields​

APISchema​

Possible Values​

AWSCredentialsFile​

Fields​

AWSOIDCExchangeToken​

Fields​

AzureOIDCExchangeToken​

Fields​

BackendSecurityPolicyAPIKey​

Fields​

BackendSecurityPolicyAWSCredentials​

Fields​

BackendSecurityPolicyAzureCredentials​

Fields​

BackendSecurityPolicyGCPCredentials​

Fields​

BackendSecurityPolicyOIDC​

Fields​

BackendSecurityPolicySpec​

Fields​

BackendSecurityPolicyStatus​

Fields​

BackendSecurityPolicyType​

Possible Values​

GCPServiceAccountImpersonationConfig​

Fields​

GCPWorkLoadIdentityFederationConfig​

Fields​

GCPWorkloadIdentityProvider​

Fields​

LLMRequestCost​

Fields​

LLMRequestCostType​

Possible Values​

VersionedAPISchema​

Fields​

aigateway.envoyproxy.io/v1alpha1

Resource Kinds

Available Kinds

Kind Definitions

AIGatewayRoute

Fields

AIGatewayRouteList

Fields

AIServiceBackend

Fields

AIServiceBackendList

Fields

BackendSecurityPolicy

Fields

BackendSecurityPolicyList

Fields

Supporting Types

Available Types

Type Definitions

AIGatewayFilterConfig

Fields

AIGatewayFilterConfigExternalProcessor

Fields

AIGatewayFilterConfigType

Possible Values

AIGatewayRouteRule

Fields

AIGatewayRouteRuleBackendRef

Fields

AIGatewayRouteRuleMatch

Fields

AIGatewayRouteSpec

Fields

AIGatewayRouteStatus

Fields

AIServiceBackendSpec

Fields

AIServiceBackendStatus

Fields

APISchema

Possible Values

AWSCredentialsFile

Fields

AWSOIDCExchangeToken

Fields

AzureOIDCExchangeToken

Fields

BackendSecurityPolicyAPIKey

Fields

BackendSecurityPolicyAWSCredentials

Fields

BackendSecurityPolicyAzureCredentials

Fields

BackendSecurityPolicyGCPCredentials

Fields

BackendSecurityPolicyOIDC

Fields

BackendSecurityPolicySpec

Fields

BackendSecurityPolicyStatus

Fields

BackendSecurityPolicyType

Possible Values

GCPServiceAccountImpersonationConfig

Fields

GCPWorkLoadIdentityFederationConfig

Fields

GCPWorkloadIdentityProvider

Fields

LLMRequestCost

Fields

LLMRequestCostType

Possible Values

VersionedAPISchema

Fields