Skip to main content
Version: latest

Supported API Endpoints

The Envoy AI Gateway provides OpenAI-compatible API endpoints as well as the Anthropic-compatible API for routing and managing LLM/AI traffic. This page documents which OpenAI API endpoints and Anthropic-compatible API endpoints are currently supported and their capabilities.

Overview

The Envoy AI Gateway acts as a proxy that accepts OpenAI-compatible and Anthropic-compatible requests and routes them to various AI providers. While it maintains compatibility with the OpenAI API specification, it currently supports a subset of the full OpenAI API.

Supported Endpoints

Chat Completions

Endpoint: POST /v1/chat/completions

Status: ✅ Fully Supported

Description: Create a chat completion response for the given conversation.

Features:

  • ✅ Streaming and non-streaming responses
  • ✅ Function calling
  • ✅ Response format specification (including JSON schema)
  • ✅ Temperature, top_p, and other sampling parameters
  • ✅ System and user messages
  • ✅ Model selection via request body or x-ai-eg-model header
  • ✅ Token usage tracking and cost calculation
  • ✅ Provider fallback and load balancing

Supported Providers:

  • OpenAI
  • AWS Bedrock (with automatic translation)
  • Azure OpenAI (with automatic translation)
  • GCP VertexAI (with automatic translation)
  • GCP Anthropic (with automatic translation)
  • Any OpenAI-compatible provider (Groq, Together AI, Mistral, Tetrate Agent Router Service, etc.)

Example:

curl -H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
]
}' \
$GATEWAY_URL/v1/chat/completions

Anthropic Messages

Endpoint: POST /anthropic/v1/messages

Status: ✅ Fully Supported

Description: Send a structured list of input messages with text and/or image content, and the model will generate the next message in the conversation.

Features:

  • ✅ Streaming and non-streaming responses
  • ✅ Function calling
  • ✅ Extended thinking
  • ✅ Response format specification (including JSON schema)
  • ✅ Temperature, top_p, and other sampling parameters
  • ✅ System and user messages
  • ✅ Model selection via request body or x-ai-eg-model header
  • ✅ Token usage tracking and cost calculation
  • ✅ Provider fallback and load balancing

Supported Providers:

  • Anthropic
  • GCP Anthropic
  • AWS Anthropic

Example:

curl -H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"max_tokens": 100
}' \
$GATEWAY_URL/anthropic/v1/messages

Completions

Endpoint: POST /v1/completions

Status: ✅ Fully Supported

Description: Create a text completion for the given prompt (legacy endpoint).

Features:

  • ✅ Non-streaming responses
  • ✅ Streaming responses
  • ✅ Model selection via request body or x-ai-eg-model header
  • ✅ Temperature, top_p, and other sampling parameters
  • ✅ Single and batch prompt processing
  • ✅ Token usage tracking and cost calculation
  • ✅ Provider fallback and load balancing
  • ✅ Full metrics support (token usage, request duration, time to first token, inter-token latency)

Supported Providers:

  • OpenAI
  • Any OpenAI-compatible provider that supports completions

Example:

curl -H "Content-Type: application/json" \
-d '{
"model": "babbage-002",
"prompt": "def fib(n):\n if n <= 1:\n return n\n else:\n return fib(n-1) + fib(n-2)",
"max_tokens": 25,
"temperature": 0.4,
"top_p": 0.9
}' \
$GATEWAY_URL/v1/completions

Embeddings

Endpoint: POST /v1/embeddings

Status: ✅ Fully Supported

Description: Create embeddings for the given input text.

Features:

  • ✅ Single and batch text embedding
  • ✅ Model selection via request body or x-ai-eg-model header
  • ✅ Token usage tracking and cost calculation
  • ✅ Provider fallback and load balancing

Supported Providers:

  • OpenAI
  • Any OpenAI-compatible provider that supports embeddings, including Azure OpenAI.

Image Generation

Endpoint: POST /v1/images/generations

Status: ✅ Supported

Description: Generate one or more images from a text prompt using OpenAI-compatible models.

Features:

  • Non-streaming responses: Returns JSON payload with image URLs or base64 content
  • Model selection: Via request body model or x-ai-eg-model header
  • Parameters: prompt, size, n, quality, response_format
  • Metrics: Records image count, model, and size; token usage when provided
  • Provider fallback and load balancing

Supported Providers:

  • OpenAI
  • Any OpenAI-compatible provider that supports image generations

Example:

curl -H "Content-Type: application/json" \
-d '{
"model": "gpt-image-1",
"prompt": "a serene mountain landscape at sunrise in watercolor",
"size": "1024x1024",
"n": 1
}' \
$GATEWAY_URL/v1/images/generations

Rerank

Endpoint: POST /cohere/v2/rerank

Status: ✅ Fully Supported

Description: Rerank a list of documents for a given query to return relevance scores and an ordered list. Cohere-compatible API.

Features:

  • ✅ Single-query document reranking
  • ✅ Model selection via request body or x-ai-eg-model header
  • ✅ Token usage tracking and cost calculation
  • ✅ Provider fallback and load balancing

Supported Providers:

  • Cohere
  • Any Cohere-compatible provider that supports rerank, including vLLM.

Example:

curl -H "Content-Type: application/json" \
-d '{
"model": "rerank-english-v3.0",
"query": "What is the capital of France?",
"documents": [
"Paris is the capital of France.",
"Berlin is the capital of Germany."
]
}' \
$GATEWAY_URL/cohere/v2/rerank

Models

Endpoint: GET /v1/models

Description: List available models configured in the AI Gateway.

Features:

  • ✅ Returns models declared in AIGatewayRoute configurations
  • ✅ OpenAI-compatible response format
  • ✅ Model metadata (ID, owned_by, created timestamp)

Example:

curl $GATEWAY_URL/v1/models

Response Format:

{
"object": "list",
"data": [
{
"id": "gpt-4o-mini",
"object": "model",
"created": 1677610602,
"owned_by": "openai"
}
]
}

Provider-Endpoint Compatibility Table

The following table summarizes which providers support which endpoints:

ProviderChat CompletionsCompletionsEmbeddingsImage GenerationAnthropic MessagesRerankNotes
OpenAI
AWS Bedrock🚧🚧Via API translation
Azure OpenAI🚧⚠️Via API translation or via OpenAI-compatible API
Google Gemini⚠️⚠️Via OpenAI-compatible API
GroqVia OpenAI-compatible API
Grok⚠️⚠️Via OpenAI-compatible API
Together AI⚠️⚠️⚠️⚠️Via OpenAI-compatible API
Cohere⚠️⚠️⚠️Via OpenAI-compatible API and Cohere V2 API for rerank
Mistral⚠️⚠️⚠️Via OpenAI-compatible API
DeepInfra⚠️⚠️Via OpenAI-compatible API
DeepSeek⚠️⚠️Via OpenAI-compatible API
Hunyuan⚠️⚠️⚠️Via OpenAI-compatible API
Tencent LLM Knowledge Engine⚠️Via OpenAI-compatible API
Tetrate Agent Router Service (TARS)⚠️⚠️⚠️Via OpenAI-compatible API
Google Vertex AI🚧🚧Via OpenAI-compatible API
Anthropic on Vertex AI🚧Via OpenAI-compatible API and Native Anthropic API
Anthropic on AWS Bedrock🚧Native Anthropic API
SambaNova⚠️Via OpenAI-compatible API
AnthropicVia OpenAI-compatible API and Native Anthropic API
  • ✅ - Supported and Tested on Envoy AI Gateway CI
  • ⚠️️ - Expected to work based on provider documentation, but not tested on the CI.
  • ❌ - Not supported according to provider documentation.
  • 🚧 - Unimplemented, or under active development but planned for future releases

Custom endpoint prefixes

By default, the gateway registers provider endpoints under these prefixes:

  • OpenAI: /
  • Cohere: /cohere
  • Anthropic: /anthropic

You can override them via Helm using values under endpointConfig:

# values.yaml
endpointConfig:
# Explicit provider roots
openai: ""
cohere: "/cohere"
anthropic: "/anthropic"
# rootPrefix applies to all routes; final paths are <rootPrefix><providerPrefix>/...
# endpointConfig:
# rootPrefix: "/"

Or with helm CLI:

helm upgrade --install ai-gateway envoyproxy/ai-gateway-helm \
-n envoy-ai-gateway-system --create-namespace \
--set 'endpointConfig.openai=/' \
--set 'endpointConfig.cohere=/cohere' \
--set 'endpointConfig.anthropic=/anthropic'

Notes:

  • endpointConfig.rootPrefix (default /) is prepended to all provider prefixes.
  • Only these keys are accepted: openaiPrefix, coherePrefix, anthropicPrefix.
  • If any key is omitted or empty, defaults are applied as listed above.

What's Next

To learn more about configuring and using the Envoy AI Gateway with these endpoints: