LangGraph memory footprint grows unbounded on long-running graphs

The symptom

You run your LangGraph workload. Peak memory on a fresh start is comfortable — maybe 200 MB. You let it run for an hour of production traffic and check again: 600 MB. Your logical state size hasn’t grown proportionally. You suspect a leak. You dig in with tracemalloc and gc.get_objects() and can’t find one.

There isn’t a leak, not in the traditional sense. The memory is being freed eventually. It’s just being allocated faster than Python’s garbage collector can reclaim it.

The root cause

Every time LangGraph checkpoints state, copy.deepcopy() walks the state structure and allocates a brand-new parallel object graph:

New dict headers for each dict in the state
New list buffers for each list
New copies of all the nested strings, numbers, custom objects
New memo dict to track already-seen references for cycle handling

All of this then gets serialized (another allocation pass) and the deep-copied object graph becomes garbage. The GC will clean it up, but not before it’s already been counted in your peak RSS.

At a steady-state allocation rate of, say, 50 MB/sec (not unusual for non-trivial state checkpointed 20 times per second), you’ll see peak memory grow by a factor of 2–3× beyond the actual logical state footprint. The difference is just “stuff waiting to be collected.”

Why GC can’t fix it

Python’s garbage collector is generational and reference-counted. It’s efficient when given time. Under high allocation pressure — which is exactly what deepcopy on every checkpoint creates — it can’t keep up:

Reference counts tick down immediately but only free objects when they hit zero
Generational collection runs periodically, not on every allocation
Large temporary allocations spend time in the young generation before being moved/freed

The result is a sawtooth memory profile where peak RSS is substantially higher than mean “live” memory. Your monitoring dashboards show the peaks. Your OOM killer sees the peaks. Your bill is based on the peaks.

Who sees this most

High-QPS agent services where checkpointing happens frequently
Long-context conversations where state has accumulated significant size
Workloads on constrained containers where a 2× memory headroom becomes the difference between stable and OOM-killed
Multi-tenant deployments where several agents share a process and compete for RAM

What doesn’t help

gc.collect() manually — adds CPU without fixing the root issue
Smaller heap sizing — causes OOMs, not fixes them
Per-tenant process isolation — masks the issue with more containers; doesn’t reduce per-tenant overhead
Going to MemorySaver — removes disk I/O but keeps the deepcopy allocation pattern

What actually helps

Stop allocating the intermediate object graph. The Rust checkpointer walks the state structure once, directly into a reusable byte buffer. The buffer gets reused across checkpoints (amortizing even its own allocation cost). No parallel object graph gets built. The allocation rate that was causing your peak memory problem simply doesn’t exist.

We’ve measured 20–40% peak memory reduction on realistic workloads after swapping in RustSQLiteCheckpointer. The savings are driven almost entirely by the eliminated intermediate allocations — the serialized bytes themselves are the same size regardless of who produces them.

Adoption

See the guide. One-line drop-in. Same on-disk format, no migration. Reversible.

Checkpoint serialization overhead — the related CPU pain point, same root cause
Why deepcopy kills LangGraph — the architectural story

How fast-langraph addresses this

RustSQLiteCheckpointer serializes state into a reusable byte buffer instead of allocating a fresh Python object graph per checkpoint. Memory allocation rate drops sharply — we typically measure 20–40% lower peak memory on realistic workloads, driven almost entirely by skipping deepcopy's intermediate objects.