Rate limits

Each API key is rate-limited independently. Limits are enforced per key, not per organisation, so creating multiple keys for distinct workloads (prod vs. batch backfill, e.g.) is a fine pattern.

Default limits

| Tier | Requests per minute | Tokens per minute | | --- | --- | --- | | Trial | 30 | 30,000 | | Starter | 120 | 200,000 | | Growth | 600 | 1,000,000 | | Enterprise | Negotiated | Negotiated |

If you need a higher limit, email us.

Rate-limit headers

Every response includes:

| Header | Description | | --- | --- | | X-RateLimit-Limit | Your per-minute request cap. | | X-RateLimit-Remaining | Requests left in the current 60-second window. | | X-RateLimit-Reset | UTC epoch seconds when the window resets. |

When you exceed the limit, you'll get a 429 response with a Retry-After header (seconds to wait):

HTTP/1.1 429 Too Many Requests
Retry-After: 18
Content-Type: application/json

{ "error": { "code": "rate_limited", "message": "Rate limit exceeded. Retry after 18s." } }

Back-off strategy

Use exponential back-off with jitter when you see 429 or 503:

async function callWithBackoff(fn, attempts = 5) {
  for (let i = 0; i < attempts; i++) {
    const res = await fn()
    if (res.status !== 429 && res.status !== 503) return res

    const retryAfter = Number(res.headers.get('Retry-After')) || 2 ** i
    const jitter     = Math.random() * 0.3 * retryAfter
    await new Promise(r => setTimeout(r, (retryAfter + jitter) * 1000))
  }
  throw new Error('Exhausted retries')
}

Idempotency-Key + retries

When retrying, send the same Idempotency-Key header so a request that succeeded on the server but failed in transit doesn't get applied twice.