April 8, 2026 · 7 min read

Why AI forgets conversations

A clear explanation of context windows, session statelessness, and why AI memory is fundamentally an engineering problem — not a model intelligence problem.

If you've used ChatGPT, Claude, or any major AI assistant for more than a few days, you've run into the same wall: the model brilliantly helps you yesterday and has no idea who you are today. You re-explain your project, paste your code again, and remind it of decisions it already helped you make. That isn't bad luck. It's how these systems are built.

Understanding why AI forgets is the first step toward fixing it — and toward building workflows that don't depend on luck or repetition.

LLMs are stateless by default

A large language model, on its own, has no memory between requests. Every API call is a fresh transaction. The provider takes whatever you sent — your prompt, your system instructions, the conversation history they chose to forward — and runs it through the model. When the response comes back, the model has no continuing notion of you. It doesn't store anything. It doesn't 'know' it just talked to you.

The illusion of memory in a single chat is created by the application layer. ChatGPT's chat UI, Claude's chat UI, and most assistants append earlier messages to your new prompt before sending it to the model. That's the entire trick. Memory is just history being re-sent.

Context windows have hard limits

Even when an app re-sends history, it can only re-send so much. Every model has a context window — the maximum number of tokens (roughly, word fragments) it can take in at once. Once your conversation grows beyond that limit, something has to give: older messages get truncated, summarised, or silently dropped.

Context windows have grown enormously — modern frontier models handle hundreds of thousands of tokens — but they're not infinite, they're not free, and they don't span sessions. The moment you close the conversation and start a new one, the window resets to zero.

Sessions don't talk to each other

Inside one chat, you have continuity. Across chats, you don't. Each new ChatGPT or Claude conversation starts fresh, and crucially, neither tool sees what happened in the other. If you used Claude to design a database schema yesterday and ChatGPT to write the migration today, ChatGPT has no idea what Claude said. The two providers are completely siloed.

This is the second forgetting: not just session-to-session, but tool-to-tool. People rarely use a single AI assistant in real workflows. They drift between ChatGPT for writing, Claude for code, Cursor or Windsurf for editing, Gemini for research. Every switch is a context reset.

Why providers don't fix it for you

Some providers have started shipping in-app memory features — facts the assistant promises to keep about you. These help, but they have three structural limits.

First, they're usually narrow: a handful of facts, not a project's worth of context. Second, they're proprietary: ChatGPT's memory doesn't travel to Claude, and Claude's doesn't travel to Cursor. Third, they're opaque: you can't always see what's stored, in what shape, or how it gets retrieved. Memory you can't inspect is memory you can't trust.

The fix is at a different layer

The real solution is to put memory outside any single model or provider. A dedicated memory layer captures what matters from your AI conversations, organises it, and makes it retrievable on demand — by any tool, in any session.

That's exactly what we're building at Vilix. Memory shouldn't be a feature inside one product; it should be a layer that sits above all of them. Read more in How to make AI remember context or Why cross-AI memory matters.

What this means for you

You don't have to wait for any single provider to solve memory. You can adopt a memory-first workflow today — pick what gets remembered, persist it deliberately, and feed it back into whichever model you're using.

AI forgetting is fixable. It's just not a problem any one model can solve from inside its own context window.