Chat Completions API

Create chat completions with automatic credit tracking.

Create Chat Completion

POST /v1/chat/completions

Creates a chat completion. This endpoint is OpenAI-compatible and works with any OpenAI SDK. Use stream: true for SSE streaming.

User Billing Variant

For user billing, call POST /v1/users/:external_user_id/chat/completionsor send X-External-User-ID when your app is configured for user mode.

Request

curl https://api.usequota.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-quota-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Supported Models

ProviderModels
OpenAIgpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, gpt-5, gpt-5-mini
Anthropicanthropic/claude-opus-4, anthropic/claude-sonnet-4, anthropic/claude-haiku-3.5
Googlegoogle/gemini-2.5-pro, google/gemini-2.5-flash, google/gemini-2.0-flash

Response

Returns an OpenAI-compatible response with additional quota metadata:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1705456789,
  "model": "gpt-4o",
  "choices": [...],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 50,
    "total_tokens": 60
  },
  "quota": {
    "credits_used": 3,
    "balance_before": 100,
    "balance_after": 97,
    "billing_mode": "developer"
  }
}

Error Responses

StatusCodeDescription
401invalid_api_keyMissing or invalid API key
400bad_requestInvalid request body or wrong billing endpoint
402insufficient_creditsBalance too low for request
429rate_limit_exceededRate limit hit (100 req/min default)
429rate_limitUpstream provider rate limit
502upstream_errorProvider returned an error
503provider_unavailableProvider API key not configured
504gateway_timeoutProvider request timed out

All errors return a JSON body:

{
  "error": {
    "code": "insufficient_credits",
    "message": "Balance too low for request"
  }
}

Streaming

Set "stream": true in the request body to receive Server-Sent Events (SSE).

OpenAI Format

The response uses Content-Type: text/event-stream. Each chunk is a JSON object prefixed with data: :

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" world"}}]}

data: [DONE]

After the [DONE] sentinel, Quota sends a final usage event:

data: {"quota":{"credits_used":3,"balance_before":100,"balance_after":97}}

Anthropic Format

Anthropic streaming uses event: prefixes instead of bare data: lines:

event: message_start
data: {"type":"message_start","message":{"id":"msg_abc","model":"claude-sonnet-4-20250514",...}}

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}

event: message_stop
data: {"type":"message_stop"}

event: quota_usage
data: {"credits_used":5,"balance_before":100,"balance_after":95}