How SOTA Agent Systems Manage Sessions and Memory

“Agent memory” sounds like one feature, but in practice it is at least four different problems: session state, durable memory, project context, and recall strategy. The current generation of agent systems does not solve these in the same way. OpenClaw treats memory as file-backed knowledge plus retrieval tools; Hermes Agent separates bounded persistent memory from a searchable session archive; Codex CLI leans on local transcripts, layered project instructions, and skills; Claude Code combines persistent CLAUDE.md rules with auto memory and resumable sessions.

The important shift is that modern agents are moving away from “just keep appending chat history” toward explicit storage tiers and explicit context assembly. That is a healthier model, because prompts are expensive, transcripts are noisy, and not every kind of knowledge should be carried into every turn.

A practical framework for comparing agent memory

Before comparing implementations, it helps to separate four layers.

Session persistence is the raw conversation or run history: messages, tool calls, timestamps, and sometimes repo state.
Durable memory is smaller and more selective: facts, preferences, and stable learnings that should survive restarts.
Project context is different again: checked-in instructions such as AGENTS.md or CLAUDE.md that define conventions, commands, and local rules.
Procedural memory lives in skills or playbooks: workflows that should be loaded only when relevant.

That gives a useful rubric for analyzing agent systems:

Where does session history live?
Where does durable memory live?
Is storage file-based, database-backed, or both?
How is context rebuilt for the next turn?
What is loaded eagerly at session start?
What is loaded lazily only when needed?
How is history compressed or capped?

1. OpenClaw: file-backed memory plus retrieval tools

OpenClaw’s public memory model is intentionally inspectable. Its memory docs describe MEMORY.md for long-term memory, daily notes under memory/YYYY-MM-DD.md, and memory tools such as memory_search and memory_get. Those tools are provided by the active memory plugin, which by default is memory-core.

That matters because OpenClaw does not treat memory as a hidden side effect. It treats memory as a tool surface. Instead of forcing everything into the prompt, it gives the agent explicit ways to search and fetch memory when needed. This pushes OpenClaw toward a workspace-like model: memory is something the agent can read and write through visible files and tools, not a mysterious blob behind the scenes.

Architecturally, OpenClaw looks like this:

durable memory in human-readable files
retrieval through memory tools
prompt assembly that can include persistent memory and fetched notes
a plugin boundary that keeps memory behavior extensible rather than hardwired

The strength of this design is inspectability. A developer can literally open the memory files, audit them, or patch them by hand. The weakness is that retrieval quality and memory curation become critical. File-backed memory is simple, but once the files grow, the quality of search and what gets injected into prompt context becomes the whole game.

My take: OpenClaw is best understood as memory as a visible workspace knowledge layer.

2. Hermes Agent: tiered memory with searchable sessions

Hermes Agent has one of the clearest separations between persistent memory and session history. Its docs describe built-in persistent memory as bounded, curated memory that persists across sessions. That built-in layer centers on MEMORY.md and USER.md. Hermes also supports external memory providers, with built-in memory remaining active alongside the selected provider. When an external provider is enabled, Hermes injects provider context into the system prompt, prefetches relevant memories before each turn, and syncs conversation turns after responses.

Hermes then separates that durable memory from a much richer session system. Its session docs describe two storage mechanisms: a SQLite database at ~/.hermes/state.db with FTS5 full-text search, and JSONL transcripts under ~/.hermes/sessions/ that preserve raw conversation history including tool calls.

That gives Hermes a true tiered architecture:

small persistent files for always-important facts
SQLite for searchable cross-session recall
JSONL for replay and auditability
optional provider-backed memory for richer retrieval
skills for procedural reuse

This is the most database-aware design of the four. It acknowledges that one storage format cannot do every job well. Small curated files are good for stable preferences. Databases are good for indexing and search. JSONL is good for append-only transcript fidelity.

My take: Hermes is the strongest example here of memory as storage tiers, not just memory as prompt stuffing.

3. Codex CLI: local transcripts, layered instructions, and lazy skills

Codex CLI takes a different path. Its docs emphasize local state, project instruction discovery, resume, and skills rather than a monolithic semantic memory layer. OpenAI’s advanced config docs describe CODEX_HOME as the local state directory, with common files including config.toml, auth.json, and history.jsonl. The docs also describe local history persistence and size capping so old entries can be dropped as the history file grows.

Codex also supports explicit resume semantics. Its feature docs say Codex stores transcripts locally and that codex resume lets you reopen earlier threads with the same repository state and instructions.

The most distinctive part of Codex’s context model is instruction layering. Codex reads AGENTS.md files before doing any work. The docs describe layered guidance from global and project scopes so you can start each task with consistent expectations.

On top of that, Codex supports skills. Skills are packaged as SKILL.md-based assets and use progressive disclosure: Codex starts with skill metadata and loads the full skill content only when it decides the skill is relevant.

So Codex’s architecture is best understood as:

local transcript persistence and resume
durable project guidance through AGENTS.md and config files
context assembly rebuilt from local artifacts each run
procedural reuse via lazy-loaded skills

Codex therefore feels highly stateful without depending on a large built-in memory graph. For coding work, that makes sense: often the most valuable continuity is not “remember everything I ever said,” but “carry the repo rules, recent transcript, and relevant workflows into the next run.”

My take: Codex CLI shows that a system can feel persistent through local artifacts and context engineering, not only through autonomous memory.

4. Claude Code: persistent project rules plus auto memory

Claude Code is a useful contrast with Codex because both are coding agents with file-based project context, but Claude is more explicit about memory as a product concept. Anthropic’s docs say each Claude Code session begins with a fresh context window and that two mechanisms carry knowledge across sessions: CLAUDE.md files and auto memory. CLAUDE.md holds instructions written by the user; auto memory stores learnings Claude writes itself based on corrections and preferences.

Claude’s scoping model is also explicit. The docs present CLAUDE.md as the persistent context layer for projects, while more detailed notes can be moved into separate topic files so startup context stays concise. Anthropic documents limits for auto memory and makes clear that not everything should be eagerly loaded at startup.

On the session side, Claude Code’s CLI reference documents --resume and also a --bare mode that skips auto-discovery of hooks, skills, plugins, MCP servers, auto memory, and CLAUDE.md so scripted calls can start faster. That is a revealing detail: Claude has a fairly rich context-loading model, but it also acknowledges the startup cost and provides a stripped-down mode when that overhead is not wanted.

Claude also supports skills, documented as reusable extensions with SKILL.md. Anthropic explicitly frames skills as the right home for procedures, rather than stuffing long workflows into CLAUDE.md.

So Claude Code combines:

resumable sessions
user-authored persistent project context via CLAUDE.md
model-authored auto memory
lazy procedural skills

My take: Claude Code is the clearest example here of rules plus learned notes as separate memory layers.

A side-by-side comparison

System	Session persistence	Durable memory	Project instructions	Procedural reuse
OpenClaw	Runtime/session lifecycle around memory tools	`MEMORY.md`, daily notes	Workspace context and memory files	`memory_search`, `memory_get`
Hermes Agent	SQLite `state.db` + JSONL transcripts	`MEMORY.md`, `USER.md`, optional external provider	Persistent memory and provider context	Skills and provider-backed recall
Codex CLI	Local transcripts and `resume`	Local history/config, but not a large built-in semantic memory layer	Layered `AGENTS.md`	Lazy-loaded skills
Claude Code	Session persistence with `--resume`	`CLAUDE.md` + auto memory	Layered `CLAUDE.md`	Lazy-loaded skills

This table hides an important philosophical difference.

OpenClaw is workspace-memory-first.
Hermes is tiered-storage-first.
Codex is local-artifacts-and-instructions-first.
Claude is productized-memory-first.

What these systems teach us

The strongest pattern across all four tools is that prompt history alone is no longer enough. Each system has introduced some explicit structure outside the raw chat stream. OpenClaw does it with files and retrieval tools. Hermes does it with durable memory plus searchable session storage. Codex does it with local transcripts and layered instruction files. Claude does it with CLAUDE.md, auto memory, and skills.

A second pattern is that facts and procedures are being separated. MEMORY.md, USER.md, AGENTS.md, and CLAUDE.md mainly hold facts, preferences, and expectations. SKILL.md increasingly holds procedures. That is a good design choice because it avoids loading long workflows into every session startup.

A third pattern is that inspectability still matters. Even sophisticated systems keep large parts of their state in files and local artifacts rather than opaque service-side memory. That improves debugging, portability, and trust.

What this means if you are building a scoped agent for your SaaS product

This is the most practical takeaway.

If you are building an agent to help users get more value from your SaaS product, you usually do not need a general-purpose “remember everything forever” memory system. You need a scoped, reliable, product-aware context system.

The right question is not “how do I build the biggest memory layer?” It is:

What should this agent remember, what should it search for, and what should it load only when needed?

The four systems above suggest a clear answer.

1. Separate session state from durable memory

Session transcript is not the same as memory.

For a SaaS agent, session state should usually capture the current user goal, recent conversation turns, tool outputs, workflow progress, and unresolved actions. Durable memory should be much smaller: user preferences, role and permission context, product setup state, recurring habits, and high-signal prior outcomes.

Hermes is especially instructive here. Its split between bounded memory and searchable session archive is close to what most product agents actually need. You do not want to inject the entire support history into every turn. You want a compact summary of what matters and a larger history that can be searched when necessary.

2. Treat product instructions as a first-class layer

Codex CLI and Claude Code both show the value of persistent instruction files. In a SaaS setting, that maps naturally to product-scoped system context.

Your agent should likely have a stable instruction layer containing:

product terminology
business rules
plan and feature availability
escalation boundaries
compliance or safety constraints
tool-usage policies

This is not user memory. It is the agent’s operating manual.

One of the easiest mistakes in SaaS agent design is burying product rules inside chat history or ephemeral memory. The better pattern is to keep them as a stable versioned layer, always available and easy to update. Codex’s AGENTS.md and Claude’s CLAUDE.md are strong examples of that.

3. Keep durable memory small and useful

OpenClaw, Hermes, and Claude all point toward the same principle: always-loaded memory must be constrained.

For a SaaS agent, durable memory should usually contain only a few categories:

identity and role context
stable preferences
product maturity or setup state
recurring friction points
trusted summaries of prior outcomes

That is much more useful than “remember everything the user ever said.” Claude’s emphasis on concise startup memory and Hermes’ bounded memory model both reinforce this.

4. Use lazy retrieval for large or infrequent knowledge

Many kinds of product knowledge should not be loaded eagerly:

help center docs
prior tickets
logs and audit trails
product docs
troubleshooting guides
account history

This is where OpenClaw’s retrieval mindset is a strong influence. The agent should have a reliable way to fetch the right artifact when the current task requires it, rather than dragging all of that knowledge into every prompt.

A good architecture is often:

small always-on memory
larger searchable history
product knowledge retrieval on demand

5. Separate facts from procedures

Codex and Claude both reinforce an important point: facts and procedures should not live in the same bucket.

Facts are things like:

the user is on the Pro plan
the workspace uses SSO
the billing admin is someone else

Procedures are things like:

how to reconnect a broken integration
how to recover a failed import
how to debug a common setup issue

The best pattern is to keep procedures as skills, playbooks, or workflows, not as raw memory entries. In many SaaS products, a reusable troubleshooting playbook is more valuable than perfect transcript recall.

6. Design for inspectability and debugging

All four systems highlight the operational value of inspectable state.

If your SaaS agent is customer-facing, you will eventually need to answer:

Why did the agent say this?
What did it remember?
Which instruction caused that behavior?
Which prior interaction influenced this answer?

Opaque memory systems make these questions painful. File-backed or clearly structured memory makes them manageable.

7. Build memory policies, not just storage

The most important lesson may be that memory quality depends more on policy than on storage format.

You need rules for:

when to save something
when not to save it
when to summarize
when to overwrite
when to retrieve
when to ignore stale context
when to trust product data over memory

For a SaaS agent, those policies should follow business value. The agent should remember what improves user outcomes, not merely what is available to store.

A practical blueprint for a SaaS-scoped agent

If I were building a scoped agent for a SaaS product, I would combine the lessons from all four systems into five layers:

1. Session layer
Recent conversation, tool outputs, current workflow state, unresolved actions.

2. Durable user/product memory
Preferences, role and permissions, setup state, recurring patterns, compact summaries of important history.

3. Product instruction layer
Product rules, business logic, escalation boundaries, plan constraints, safety rules.

4. Searchable history layer
Tickets, logs, prior sessions, audit events, knowledge base content.

5. Procedural skill layer
Onboarding flows, troubleshooting guides, recovery playbooks, upgrade guidance, integration setup steps.

That architecture is usually far better than a single giant “memory” blob.

Conclusion

The biggest lesson from OpenClaw, Hermes Agent, Codex CLI, and Claude Code is that a good agent is not the one that remembers the most. It is the one that remembers the right things in the right layer.

For serious product agents, that usually means:

keep live session context separate from durable memory
keep product rules separate from user memory
keep procedures separate from facts
keep large knowledge sources out of the prompt until needed
make memory inspectable and bounded
optimize for user success, not maximal recall

In other words, the best SaaS agent memory system is not a replica of human memory. It is a well-designed product context system.