Version: 0.2

Data Plane and Traffic Flow

The data plane handles the actual request traffic, with the External Processor (ExtProc) playing a central role in managing AI-specific processing.

Components

The data plane consists of several key components:

The core proxy that handles all incoming traffic and integrates with:

A specialized extension service of Envoy Proxy that handles AI-specific processing needs. It performs three main functions:

Request Processing
- Routes requests to appropriate AI providers
- Handles model selection and validation
- Manages provider-specific authentication
- Supports different API formats (OpenAI, AWS Bedrock)
Token Management
- Tracks token usage from AI providers
- Handles both streaming and non-streaming responses
- Provides usage data for rate limiting decisions
Provider Integration
- Transforms requests between different AI provider formats
- Normalizes responses to a consistent format
- Manages provider-specific requirements

Handles token-based rate limiting by:

The data plane processes requests through several key steps:

Routing: Calculates the destination AI provider based on:
- Request path
- Headers
- Model name extracted from the request body
Request Transformation: Prepares the request for the provider:
- Request body transformation
- Request path modification
- Format adaptation
Upstream Authorization: Handles provider authentication:
- API key management
- Header modifications
- Authentication token handling
Token Rate Limiting Check: Checks the request against the Rate Limit Service:
- Validates token usage
- Enforces rate limits based on configured budgets

Response Transformation:
- Transforms provider response for client compatibility
- Normalizes response format
- Handles streaming responses
Token Usage Management:
- Extracts token usage from responses
- Calculates usage based on configuration
- Stores usage in per-request dynamic metadata
- Enables rate limiting based on token consumption

To learn more: