Hopping Context Windows

February 24, 2026

Every production LLM agent handles context exhaustion the same way: run until the window fills, stop, summarize, restart. It's called compaction. The user waits while the agent summarizes its own life story, and when it comes back there's a discontinuity—rich context replaced by a lossy sketch.

We'd been working on agent lifecycles for weeks—formal proofs about why bounded agents must divide and renew, a working Elixir prototype where agents split like cells under context pressure, deep dives into generational garbage collection theory. The GC literature kept pointing at one idea: concurrent collection without stopping the world. G1 and ZGC do it for heap memory. Nobody had done it for LLM context.

Our first attempt was overcomplicated. A "shadow agent" that overlaps with the parent, observes its work, develops its own understanding. Convergence metrics, progressive handoff fractions, the works.

Then we realized the shadow doesn't need to process anything. It's not a second agent. It's just a second list.

The mechanism

At 70% capacity, summarize the conversation into a checkpoint. Start a back buffer seeded with that checkpoint. Keep working. Append every new message to both the active context and the back buffer. When the active context hits its limit, swap. Done.

T:     Summarize history into checkpoint.
       Back buffer = [checkpoint].

T..T': Agent keeps working.
       Each message also appended to back buffer.

T':    Agent hits limit.
       Swap to back buffer. Done.

This is double buffering (graphics, 1970s), checkpoint + WAL replay (databases, 1980s), and hopping windows (stream processing). The pieces are 40 years old. We searched over 50 sources—papers, frameworks, production systems. Nobody had composed them for LLM context.

Why it's free

Stop-the-world compaction already makes one summarization call—at the worst possible moment, when the model is under maximum attention pressure. This makes the same call earlier, at 70% capacity. The summary is cheaper and higher quality. The back buffer is just a list. ~30% memory overhead, zero compute until cutover.

Worst case—a burst right after the checkpoint—degrades to exactly what everyone does today.

What this doesn't solve

This is one solution at one level. It handles context continuity—the agent doesn't pause, recent history stays at full fidelity. It doesn't handle external state: tool bindings, in-flight calls, side effects. It doesn't prevent compounding summary loss over many generations. It doesn't make agents smarter or give them better memory architecture. It just removes an unnecessary pause that every framework currently imposes.

Sometimes the useful contribution is the small, boring one.

The full paper is available on GitHub.

Upstream contributions

Reference implementations submitted to ten projects:

Framework	PR	Status
LangChain	#35434	Open
Semantic Kernel	#13590	Open
CrewAI	#4588	Open
Anthropic SDK	#1203	Open
OpenAI Agents	#2543	Closed—reviewed by core maintainer; idea acknowledged, session contract alignment too deep for external PR
OpenClaw	#26271	Open
Letta	#3203 (issue—fork PR blocked by permissions)	Open
OpenCode	#15129 (bug fix), #15130 (double-buffer)	Open
Cline	#9556	Open
Aider	#4858	Open