Chat Completions API
Create chat completions with automatic credit tracking.
Create Chat Completion
POST /v1/chat/completions
Creates a chat completion. This endpoint is OpenAI-compatible and works with any OpenAI SDK. Use stream: true for SSE streaming.
User Billing Variant
For user billing, call POST /v1/users/:external_user_id/chat/completionsor send X-External-User-ID when your app is configured for user mode.
Request
curl https://api.usequota.ai/v1/chat/completions \
-H "Authorization: Bearer sk-quota-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'Supported Models
| Provider | Models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, gpt-5, gpt-5-mini |
| Anthropic | anthropic/claude-opus-4, anthropic/claude-sonnet-4, anthropic/claude-haiku-3.5 |
google/gemini-2.5-pro, google/gemini-2.5-flash, google/gemini-2.0-flash |
Response
Returns an OpenAI-compatible response with additional quota metadata:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1705456789,
"model": "gpt-4o",
"choices": [...],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 50,
"total_tokens": 60
},
"quota": {
"credits_used": 3,
"balance_before": 100,
"balance_after": 97,
"billing_mode": "developer"
}
}Error Responses
| Status | Code | Description |
|---|---|---|
| 401 | invalid_api_key | Missing or invalid API key |
| 400 | bad_request | Invalid request body or wrong billing endpoint |
| 402 | insufficient_credits | Balance too low for request |
| 429 | rate_limit_exceeded | Rate limit hit (100 req/min default) |
| 429 | rate_limit | Upstream provider rate limit |
| 502 | upstream_error | Provider returned an error |
| 503 | provider_unavailable | Provider API key not configured |
| 504 | gateway_timeout | Provider request timed out |
All errors return a JSON body:
{
"error": {
"code": "insufficient_credits",
"message": "Balance too low for request"
}
}Streaming
Set "stream": true in the request body to receive Server-Sent Events (SSE).
OpenAI Format
The response uses Content-Type: text/event-stream. Each chunk is a JSON object prefixed with data: :
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" world"}}]}
data: [DONE]After the [DONE] sentinel, Quota sends a final usage event:
data: {"quota":{"credits_used":3,"balance_before":100,"balance_after":97}}Anthropic Format
Anthropic streaming uses event: prefixes instead of bare data: lines:
event: message_start
data: {"type":"message_start","message":{"id":"msg_abc","model":"claude-sonnet-4-20250514",...}}
event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}
event: message_stop
data: {"type":"message_stop"}
event: quota_usage
data: {"credits_used":5,"balance_before":100,"balance_after":95}