- Per-user rate limits to protect the system from burst traffic
- Plan limits for monthly call and token usage
Per-user requests per minute
Request limits are enforced perexternal_user_id, not across your whole tenant.
| Plan | Requests per user per minute |
|---|---|
| Free | 3 |
| Starter | 10 |
| Growth | 30 |
| Enterprise | Unlimited |
Plan limits
| Plan | Monthly calls | Monthly tokens | Write calls | Retrieval calls |
|---|---|---|---|---|
| Free | 5,000 | 2,000,000 | 5,000 | Unlimited |
| Starter | 50,000 | 25,000,000 | 50,000 | Unlimited |
| Growth | 500,000 | 250,000,000 | 500,000 | Unlimited |
| Enterprise | Unlimited | Unlimited | Unlimited | Unlimited |
How to think about these limits
- Monthly calls are total requests across the billing month
- Monthly tokens are the extraction tokens counted toward your plan budget
- Write calls are memory creation requests such as
add() - Retrieval calls cover memory lookups and are unlimited on current public plans
What happens when you hit a limit
Behavior depends on your tenant’s current quota mode and overage policy:FULLmeans requests are processed normallyPASSTHROUGHmeans MemoryOS skips storage and your app should continue without memoryDEGRADED_RETRIEVEmeans writes may pause while retrieval continues in reduced modeBLOCKEDmeans the request is rejected until budget resets or the plan changes