Skip to main content
MemoryOS applies two kinds of limits:
  • Per-user rate limits to protect the system from burst traffic
  • Plan limits for monthly call and token usage

Per-user requests per minute

Request limits are enforced per external_user_id, not across your whole tenant.
PlanRequests per user per minute
Free3
Starter10
Growth30
EnterpriseUnlimited

Plan limits

PlanMonthly callsMonthly tokensWrite callsRetrieval calls
Free5,0002,000,0005,000Unlimited
Starter50,00025,000,00050,000Unlimited
Growth500,000250,000,000500,000Unlimited
EnterpriseUnlimitedUnlimitedUnlimitedUnlimited

How to think about these limits

  • Monthly calls are total requests across the billing month
  • Monthly tokens are the extraction tokens counted toward your plan budget
  • Write calls are memory creation requests such as add()
  • Retrieval calls cover memory lookups and are unlimited on current public plans

What happens when you hit a limit

Behavior depends on your tenant’s current quota mode and overage policy:
  • FULL means requests are processed normally
  • PASSTHROUGH means MemoryOS skips storage and your app should continue without memory
  • DEGRADED_RETRIEVE means writes may pause while retrieval continues in reduced mode
  • BLOCKED means the request is rejected until budget resets or the plan changes
See Degradation for the exact handling pattern and response headers.