Connect GCP VertexAI
This guide will help you configure Envoy AI Gateway to work with GCP VertexAI's Gemini and Anthropic models.
Prerequisites
Before you begin, you'll need:
- GCP credentials with access to GCP VertexAI
- Basic setup completed from the Basic Usage guide
- Basic configuration removed as described in the Advanced Configuration overview
GCP Credentials Setup
Ensure you have:
- Your GCP project id and name.
- In your GCP project, enable VertexAI API access.
- Create a GCP service account and generate the JSON key file.
Consider using GCP Workload Identity (Federation)/IAM roles and limited-scope credentials for production environments.
Configuration Steps
1. Download configuration template
curl -O https://raw.githubusercontent.com/envoyproxy/ai-gateway/main/examples/basic/gcp_vertex.yaml
2. Configure GCP Credentials
Edit the gcp_vertex.yaml
file to replace these placeholder values:
GCP_PROJECT_NAME
: Your GCP project nameGCP_REGION
: GCP region- Update the generated service account key JSON string in the secret
Make sure to keep your GCP service account credentials secure and never commit them to version control. The credentials will be stored in Kubernetes secrets.
3. Apply Configuration
Apply the updated configuration and wait for the Gateway pod to be ready. If you already have a Gateway running, then the secret credential update will be picked up automatically in a few seconds.
kubectl apply -f gcp_vertex.yaml
kubectl wait pods --timeout=2m \
-l gateway.envoyproxy.io/owning-gateway-name=envoy-ai-gateway-basic \
-n envoy-gateway-system \
--for=condition=Ready
4. Test the Configuration
You should have set $GATEWAY_URL
as part of the basic setup before connecting to providers.
See the Basic Usage page for instructions.
To access a Gemini model with chat completion endpoint:
curl -H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [
{
"role": "user",
"content": "Hi."
}
]
}' \
$GATEWAY_URL/v1/chat/completions
To access an Anthropic model with chat completion endpoint:
curl -H "Content-Type: application/json" \
-d '{
"model": "claude-3-7-sonnet@20250219",
"messages": [
{
"role": "user",
"content": "What is capital of France?"
}
],
"max_completion_tokens": 100
}' \
$GATEWAY_URL/v1/chat/completions
Expected output:
{
"choices":[
{
"finish_reason":"stop",
"index":0,
"message":{
"content":"The capital of France is Paris. Paris is not only the capital city but also the largest city in France, known for its cultural significance, historic landmarks like the Eiffel Tower and the Louvre Museum, and its influence in fashion, art, and cuisine.",
"role":"assistant"
}
}
],
"object":"chat.completion",
"usage":{"completion_tokens":58,"prompt_tokens":13,"total_tokens":71}
}
You can also access an Anthropic model with native Anthropic messages endpoint:
curl -H "Content-Type: application/json" \
-d '{
"model": "claude-3-7-sonnet@20250219",
"messages": [
{
"role": "user",
"content": "What is capital of France?"
}
],
"max_tokens": 100
}' \
$GATEWAY_URL/anthropic/v1/messages
Troubleshooting
If you encounter issues:
- Verify your GCP credentials are correct and active
- Check pod status:
kubectl get pods -n envoy-gateway-system
- View controller logs:
kubectl logs -n envoy-ai-gateway-system deployment/ai-gateway-controller
- Common errors:
- 401/403: Invalid credentials or insufficient permissions
- 404: Model not found or not available in a region
- 429: Rate limit exceeded
Configuring More Models
To use more models, add more AIGatewayRouteRules to the gcp_vertex.yaml
file with the model ID in the value
field. For example, to use [Claude 3 Sonnet]
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: envoy-ai-gateway-basic-gcp-gemini
namespace: default
spec:
schema:
name: OpenAI
parentRefs:
- name: envoy-ai-gateway-basic
kind: Gateway
group: gateway.networking.k8s.io
rules:
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: gemini-2.5-flash-pro
backendRefs:
- name: envoy-ai-gateway-basic-gcp