Context Window vs Memory: The Difference Nobody Explains

Two concepts, one confusion#

If you have read marketing pages for ChatGPT, Claude, or Gemini in the last year, you have seen both phrases: context window and memory. They are often used in the same breath, as if they mean the same thing. They absolutely do not.

Confusing the two is the reason many people are frustrated when an AI "forgets" things it seemingly knew five minutes ago, or when they cannot work out why a conversation stops responding the way it used to.

The difference matters because it determines what the tool can actually do, and how you should structure your work to avoid hitting invisible limits.

Context window: short-term, in-session, bounded#

The context window is everything the model "sees" at the moment it generates a response. It includes:

The system prompt
All previous messages in the current conversation
Any files, images, or documents you have attached
The question you just asked
The response the model is generating

Context windows are measured in tokens. Rough rule: one token is about 0.75 English words or 0.5 German words. GPT-5.4 has a 1 million token context window. Claude Opus 4.6 also has a 1M window. Gemini 2.0 has 2M.

Once the conversation fills the context window, the oldest content is dropped. This is not always visible to you. The conversation keeps going, but the model no longer has access to what was said earlier. It starts forgetting without telling you.

Critically: the context window resets every new conversation. Start a new chat, and the model has zero knowledge of any previous chat you had with it.

Memory: cross-session, persistent, curated#

Memory is a separate system that exists outside the context window. It is a structured record that the model writes to and reads from across different conversations.

When ChatGPT says "I will remember that", it is writing a small note into a memory store associated with your account. That note is loaded into the context window of every future conversation, usually as part of the system prompt.

Not everything you say gets remembered. Most platforms have two modes:

Automatic memory: the model decides what is worth keeping ("user prefers brief answers", "user is working on a SaaS launch in Q2")
Explicit memory: you tell it to remember something specific

ChatGPT, Claude, and Gemini all have memory systems now, but they work differently. ChatGPT's is the most aggressive about remembering automatically. Claude's is more conservative and usually requires you to explicitly ask. Gemini's is tied to your Google account context and is less visible.

Memory has limits too, but they are much smaller than context windows. We are talking about hundreds or low thousands of tokens of persistent memory, not millions.

Why confusing them leads to bad outcomes#

Here are the four most common problems I see, all of them caused by mixing up the two.

Problem 1: "Why did it forget what I told it 30 minutes ago?"#

If you said it in the same conversation, and the conversation is long enough that the context window filled up, the model lost it. This is not memory failing. It is the context window doing its job.

Fix: for anything you want preserved across a long conversation, save it yourself in a file or a persistent note, and re-paste it when relevant.

Problem 2: "Why does it remember things from last week?"#

Memory, not context. The model noted something worth keeping in a previous conversation, and it is surfacing in the current one. You can view and delete these memories in the settings of most modern tools.

Fix: audit your memory periodically. Old memories can pollute current conversations in ways you do not expect.

Problem 3: "I told it my brand guidelines three times and it keeps ignoring them"#

If you told it three times across three separate conversations, the model has not learned anything. Each conversation started fresh. What you want is either:

A persistent system prompt (Custom Instructions in ChatGPT, Projects in Claude)
Or memory entries that capture the guidelines

Fix: put durable context in the right place. Custom Instructions and Projects are the right answer for brand voice, role preferences, and recurring constraints.

Problem 4: "1 million token context means I can upload my entire codebase, right?"#

Yes, but with asterisks. Attention degrades across very long contexts. The model technically has access to all of it, but its ability to use information from the middle of a 900.000-token context is measurably worse than information at the start or end.

Fix: for very long contexts, structure matters. Put the most important information at the top, the question at the bottom, and explicitly reference specific sections when asking.

How to actually use both#

The working mental model: context is for "what we are doing right now", memory is for "who I am and how I work".

Context window:

The current task
The files you need for this task
The conversation history for this task
Temporary instructions

Memory / Custom Instructions / Projects:

Your role, preferences, and expertise
Your writing style and tone
Recurring constraints (always "Sie" form, never em-dashes)
Stable project information (client names, ongoing initiatives)

If you keep that split clean, almost all the "why did it forget" problems disappear.

Where each tool actually stands#

Quick reference as of March 2026:

ChatGPT: 1M token context (Plus and above), aggressive automatic memory, Custom Instructions for persistent system prompts. Memory is the most visible of the three platforms.

Claude: 1M token context, conservative memory that requires explicit opt-in, Projects for persistent context per workspace. Projects are the best implementation of "persistent context" I have used. See the Claude guide.

Gemini: 2M token context, memory tied to Google account context, less granular control than the other two. The context window is biggest on paper, but practical usefulness beyond ~500K tokens drops noticeably.

Cursor and other coding tools: context is usually the IDE's view of your project, which is smaller than the raw token limits suggest because they truncate aggressively to stay responsive.

For a deeper breakdown of how each tool handles this, our ChatGPT vs Claude comparison covers the practical differences in daily use.

The takeaway#

Treat the context window like short-term working memory in your own head: everything you are holding right now, limited, gets flushed when the task ends. Treat AI memory like your notes on the whiteboard: smaller, durable, but only useful if you curate it.

Confusing the two is the single most common source of AI tool frustration I see. Fixing it does not make the tool more capable, but it makes it feel dramatically more reliable.

Roland Hentschel

AI & Web Technology Expert

Web developer and AI enthusiast helping businesses navigate the rapidly evolving landscape of AI tools. Testing and comparing tools so you don't have to.

Context Window vs Memory: The Difference Nobody Explains

Two concepts, one confusion#

Context window: short-term, in-session, bounded#

Memory: cross-session, persistent, curated#

Why confusing them leads to bad outcomes#

Problem 1: "Why did it forget what I told it 30 minutes ago?"#

Problem 2: "Why does it remember things from last week?"#

Problem 3: "I told it my brand guidelines three times and it keeps ignoring them"#

Problem 4: "1 million token context means I can upload my entire codebase, right?"#

How to actually use both#

Where each tool actually stands#

The takeaway#

Roland Hentschel

Tools Covered in This Post

Rytr Guide 2026

Claude Guide 2026

Surfer SEO Pricing 2026: Plans, Costs and How to Use It

More from the Blog

Generative Engine Optimization: How to Track Your Brand in AI Search

Should You Build Your MVP With Lovable in 2026?

AI Agents and MCP Go Mainstream