POST /v1/memories/retrieve

Endpoint

POST /v1/memories/retrieve

Authentication

Authorization: ApiKey mem_...

Request body

{
  "external_user_id": "customer-123",
  "query": "How should I respond to this user?",
  "limit": 5,
  "categories": ["preference", "goal"],
  "agent_id": "support-bot",
  "time_filter_days": 30,
  "format": "bullets",
  "context_max_tokens": 500
}

Schema

Field	Type	Required	Notes
`external_user_id`	`string`	Yes	End-user identifier inside your tenant
`query`	`string`	Yes	Natural-language retrieval query
`limit`	`integer`	No	Default `10`, max `50`
`categories`	`MemoryCategory[]`	No	Optional category filters
`agent_id`	`string \| null`	No	Optional agent filter
`time_filter_days`	`integer \| null`	No	Return only memories created in the last N days
`format`	`"bullets" \| "json" \| "xml"`	No	Default `bullets`
`context_max_tokens`	`integer`	No	Default `500`; max context budget for `system_prompt_addition`

Response

{
  "data": [
    {
      "id": "2b8f5f87-bbd4-4f84-9f5f-0cba5033f058",
      "content": "User prefers concise technical explanations and Python examples.",
      "category": "preference",
      "importance_score": 8.5,
      "last_accessed": "2026-04-17T09:45:00Z",
      "relevance_score": 0.962341,
      "context_snippet": "- User prefers concise technical explanations and Python examples."
    }
  ],
  "cached": false,
  "system_prompt_addition": "What you know about this user:\n- User prefers concise technical explanations and Python examples.",
  "context_token_count": 18,
  "memories_from_hot_tier": 0,
  "clarification_question": null,
  "request_id": "b0eb46a4-8794-44c8-b2a9-8f2dfbb4176c",
  "timestamp": "2026-04-17T09:45:03Z"
}

Response schema

Top-level fields

Field	Type	Meaning
`data`	`MemorySearchResult[]`	Ranked memory results
`cached`	`boolean`	Whether retrieval came from the hot cache
`system_prompt_addition`	`string`	Prompt-ready memory context
`context_token_count`	`integer \| null`	Token count of the built context when available
`memories_from_hot_tier`	`integer`	Number of returned memories served from Redis hot tier
`clarification_question`	`string \| null`	Optional user-facing question for resolving a pending conflict
`request_id`	`string`	Trace id
`timestamp`	`datetime`	Response timestamp

Domain-aware retrieval

If a tenant has a domain schema enabled, the same retrieve endpoint returns domain-aware context in system_prompt_addition. For example, an EdTech tenant may receive tutoring context about exam goals, weak topics, learning style, or forgetting-stage review urgency. The request shape does not change:

{
  "external_user_id": "student_123",
  "query": "teach this student trigonometry identities",
  "limit": 8,
  "context_max_tokens": 600
}

Use optional domain profile endpoints only when you need structured UI data, not for normal model calls.

`MemorySearchResult`

Field	Type	Meaning
`id`	`string`	Memory id
`content`	`string`	Memory text
`category`	`string`	Memory category
`importance_score`	`float`	Importance score
`last_accessed`	`datetime \| null`	Last access timestamp
`relevance_score`	`float`	Final retrieval score
`context_snippet`	`string`	Single-memory rendering in the selected format

`format` examples

The format value controls how MemoryOS renders both context_snippet and system_prompt_addition.

`bullets`

{
  "format": "bullets"
}

Example system_prompt_addition:

What you know about this user:
- User prefers concise technical explanations and Python examples.

`json`

{
  "format": "json"
}

Example system_prompt_addition:

{
  "preference": [
    "User prefers concise technical explanations and Python examples."
  ]
}

`xml`

{
  "format": "xml"
}

Example system_prompt_addition:

What you know about this user:
<memory_context>
  <memory category="preference">
    User prefers concise technical explanations and Python examples.
  </memory>
</memory_context>

Context token limit

Use context_max_tokens to limit the size of system_prompt_addition.

{
  "external_user_id": "customer-123",
  "query": "How should I answer this user?",
  "limit": 10,
  "format": "bullets",
  "context_max_tokens": 300
}

MemoryOS drops lower-importance memories first when the context is too large. It does not truncate mid-sentence.

​Endpoint

​Authentication

​Request body

​Schema

​Response

​Response schema

​Top-level fields

​Domain-aware retrieval

​MemorySearchResult

​format examples

​bullets

​json

​xml

​Context token limit

Endpoint

Authentication

Request body

Schema

Response

Response schema

Top-level fields

Domain-aware retrieval

`MemorySearchResult`

`format` examples

`bullets`

`json`

`xml`

Context token limit