Dan Bulteel

Guide to designing, testing, and evaluating memory in AI agents

One of the coolest parts of my job is getting a front-row seat to how @marianaprazeres thinks about AI.

“Memory” feels table stakes in AI right now. But for @Meet-Ting, it’s not just a log of the past - it’s a living system that shapes how people schedule, work, and want to spend their week.

It’s not just logistics - it’s patterns around energy, priorities, and relationships over time.

Here are a few things we learned while designing and testing agent memory in production:

1. Not all memories should live forever

Some signals are stable (working hours, preferred meeting length).
Others are situational (this week is hectic, energy is low, context has changed).

We found it critical to design decay and confidence into memory, rather than treating everything as permanent truth.

2. Implicit signals matter more than explicit ones

Users rarely tell you what they want directly.

They show it by:

  • How fast they reply

  • Which slots they ignore

  • Who they’re willing to move time for

  • When they push meetings out

Those patterns, observed over time, turned out to be more reliable than settings or preferences.

3. Memory should adapt, not lock in

One early failure mode: the agent becoming too confident.

If memory only reinforces past behaviour, it can trap users in old routines. We had to intentionally allow memory to be challenged, overridden, and re-learned as people’s weeks (and lives) change.

Full Guide

Mariana wrote a full step-by-step guide including how we tested memory, evaluated failure modes, and avoided common traps:

👉 https://theevalloop.substack.com/p/testing-ai-agent-memory-guide

If this is useful, let us know so we can share more learnings from building agents in production?

Dan

40 views

Add a comment

Replies

Be the first to comment