Rate limits
Each API key is rate-limited independently. Limits are enforced per key, not per organisation, so creating multiple keys for distinct workloads (prod vs. batch backfill, e.g.) is a fine pattern.
Default limits
| Tier | Requests per minute | Tokens per minute | | --- | --- | --- | | Trial | 30 | 30,000 | | Starter | 120 | 200,000 | | Growth | 600 | 1,000,000 | | Enterprise | Negotiated | Negotiated |
If you need a higher limit, email us.
Rate-limit headers
Every response includes:
| Header | Description |
| --- | --- |
| X-RateLimit-Limit | Your per-minute request cap. |
| X-RateLimit-Remaining | Requests left in the current 60-second window. |
| X-RateLimit-Reset | UTC epoch seconds when the window resets. |
When you exceed the limit, you'll get a 429 response with a
Retry-After header (seconds to wait):
HTTP/1.1 429 Too Many Requests
Retry-After: 18
Content-Type: application/json
{ "error": { "code": "rate_limited", "message": "Rate limit exceeded. Retry after 18s." } }
Back-off strategy
Use exponential back-off with jitter when you see 429 or 503:
async function callWithBackoff(fn, attempts = 5) {
for (let i = 0; i < attempts; i++) {
const res = await fn()
if (res.status !== 429 && res.status !== 503) return res
const retryAfter = Number(res.headers.get('Retry-After')) || 2 ** i
const jitter = Math.random() * 0.3 * retryAfter
await new Promise(r => setTimeout(r, (retryAfter + jitter) * 1000))
}
throw new Error('Exhausted retries')
}
When retrying, send the same Idempotency-Key header so a
request that succeeded on the server but failed in transit doesn't get
applied twice.