Skip to main content
The MemoryOS quality gate runs before memory ingestion is queued. Its job is to prevent low-value, duplicate, or out-of-budget writes from becoming stored memories.

The four layers

LayerPurposeBackend rule
L1Per-user rate limitingBlocks when requests exceed the tenant’s per-user per-minute limit
L2Low-quality input filterBlocks when the quality score is below 0.35
L3Semantic duplicate detectionBlocks when semantic similarity is above 0.92
L4Budget governanceBlocks when quota is exhausted and overage policy is block

What gets blocked

LayerExample inputTypical reason
L1Same user spams many writes in one minuterate_limit_exceeded
L2"ok", "hi", "??"low_quality
L3Re-sending the same preference statement repeatedlyduplicate_query
L4Tenant is out of monthly calls or tokens on block modebudget_exhausted

How quality is scored

The current quality score is based on:
  • number of messages
  • average message length
  • lexical diversity
  • whether the conversation contains a question signal
Very short or content-free messages are likely to score poorly.

How to improve your block rate

  1. Send meaningful user facts, preferences, goals, or procedures instead of filler text.
  2. Avoid writing the same memory-worthy statement repeatedly.
  3. Batch coherent conversational turns together instead of sending single-word fragments.
  4. Respect the per-user write rate limit.
  5. Monitor blocked_reason and budget_remaining_pct on add() responses.

Operational note

A blocked write returns HTTP 200, but the response status indicates the blocking layer, such as L2 or L4. Inspect the response body, not just the status code.