Context-Aware AI Agent with Short-Term and Long-Term Memory

AI Agents LLM Systems Memory Systems Redis Vector Databases Docker AWS September 2025 - December 2025

I designed and implemented a context-aware AI agent system that supports both short-term and long-term memory across multiple users and sessions. The goal of this project was to move beyond stateless chatbots by building an agent that can remember prior conversations, retain user preferences, and reason coherently across sessions, while remaining scalable and isolated at the systems level.

The agent is built using a ReAct workflow and deployed as a containerized, multi-user service on AWS, with Redis powering short-term memory and Mem0 providing long-term semantic memory backed by a vector database.

Agent Architecture and Deployment

I deployed the full system on an AWS EC2 t3.large instance. I chose this instance because it balances cost and performance and can handle several agents at once. Each user runs inside a dedicated Docker container that is created on demand from a shared template.

I also built a dispatcher service that controls container creation and request routing.

If a user container does not exist, the dispatcher creates it
If it already exists, the dispatcher routes requests through Docker’s internal network
Each container has its own environment variables, credentials, and agent state

This setup shows how important isolation is for multi-user AI systems. It also made scaling straightforward since the infrastructure does not depend on user-specific code changes.

Session Isolation and Multi-Session Design

Inside each user container, I separated conversations using session IDs. This allows one user to run multiple conversations without mixing context.

I treated sessions as strict boundaries for memory handling.

Recent conversation context stays within a session
Older information moves into long-term storage
Intra-session and inter-session recall can be evaluated independently

Designing this early made it much easier to reason about memory later. It also mirrors how real AI assistants handle multiple chat threads under one account.

Short-Term Memory with Redis

For short-term memory, I used Redis as an in-memory key-value store. This stores the most recent context that the agent needs right away.

Each user and session maps to a unique Redis key
The last five conversation turns are stored verbatim
Older turns are removed to avoid context growth

Redis worked well because it is fast and simple. It let the agent refer to recent messages without asking users to repeat themselves or re-run tools.

Long-Term Memory with Mem0 and Vector Storage

For long-term memory, I integrated Mem0. This system extracts key facts from conversations and stores them as vector embeddings.

I used long-term memory to store things like:

User interests and preferences
Important facts referenced across sessions
Information that should persist beyond one conversation

When a new request arrives, the agent pulls relevant short-term memory from Redis and related long-term memory from Mem0. Both are added to the prompt in a structured way. This helped the agent stay within token limits while still reasoning with past knowledge.

Memory-Aware Prompting

Instead of sending full conversation history every time, I built a simple memory hierarchy.

Raw messages for recent context
Compressed summaries for older context

This approach reduced repeated searches and clarification questions. It also made the agent’s behavior more consistent. I learned that memory design matters more than prompt length once conversations grow.

Evaluation and Results

I evaluated the system using a 34-request benchmark across multiple users and sessions.

Phase 1 without memory

The agent used tools correctly
It could not recall earlier context
It often asked for clarification
It repeated searches for the same facts

Phase 2 with Redis and Mem0

Short-term memory recall worked in all tested sessions
Long-term memory recall worked for stored user preferences
Most memory-dependent queries were answered directly
Tool usage dropped because previous results were remembered

Seeing the side-by-side comparison made the value of memory very clear. The agent behaved more naturally and required less user effort.

I also experimented with optional cross-user memory sharing. A user can choose to share a specific memory with another user. Shared memories are stored with metadata that limits who can access them.

This allowed collaboration while keeping strong isolation. Only explicitly shared facts become visible to others. No other context leaks across users.

Scalability and Resource Management

I designed the system to scale cleanly.

The EC2 instance can be upgraded for vertical scaling
Multiple dispatcher nodes can be added for horizontal scaling
Containers can be given CPU and memory limits
Inactive containers can be cleaned up to free resources

Redis stays ephemeral while long-term memory lives outside the core system. This keeps the agent stack stateless and easier to manage.

What I Learned

This project changed how I think about AI agents.

Agents are systems, not just prompts
Memory needs structure and limits
Containers are a practical way to isolate users
Latency, cost, and memory quality trade off against each other
Reproducible infrastructure matters as much as model choice

The biggest takeaway was that persistent memory is not optional for serious agents. Without it, even strong models lose coherence quickly. With it, agents feel more reliable, personal, and useful.