Migrating from vanilla LangGraph to fast-langraph

This is the checklist we give teams when they’re ready to adopt fast-langraph in production. It’s intentionally cautious — nothing irreversible happens until step 6, and by then you’ll have numbers proving each change earned its place.

0. Prerequisites

Python 3.9+
Any LangGraph version (1.0.x tested; older versions generally work)
A representative workload you can run repeatedly (staging deployment, captured trace, or benchmark harness)
Permission to merge your own PRs

1. Baseline: measure current performance

Before touching anything, capture numbers. You want:

p50 and p95 wall clock on a representative invocation
Total memory footprint of a typical graph run
LLM API cost per invocation (if applicable)
Checkpoint size if you’re using persistence

import time, tracemalloc
tracemalloc.start()
t0 = time.perf_counter()
result = graph.invoke(sample_input)
dt = (time.perf_counter() - t0) * 1000
peak = tracemalloc.get_traced_memory()[1] / 1024
print(f"{dt:.1f} ms, peak {peak:.0f} KB")

Write these down. You will compare against them after every step.

2. Profile: know where time goes

pip install fast-langgraph

Install the library but don’t enable anything yet:

from fast_langgraph.profiler import GraphProfiler

profiler = GraphProfiler()
with profiler.profile_run():
    graph.invoke(sample_input)
profiler.print_report()

Note the top 3 wall-clock contributors. This determines which components you’ll adopt and in what order. See profiling bottlenecks for interpretation.

3. Adopt the shim

import fast_langgraph
fast_langgraph.shim.patch_langgraph()

Add this at the very top of your entry point. Run your test suite. Run your benchmark again. Compare wall clock against your baseline — you should see ~2.8× if your graph has meaningful executor and channel overhead.

Rollback: remove the two lines or call fast_langgraph.shim.unpatch_langgraph().

4. Swap the checkpointer

If checkpointing showed up in your profile, do this next:

from fast_langgraph import RustSQLiteCheckpointer
# before: checkpointer = SqliteSaver.from_conn_string("state.db")
checkpointer = RustSQLiteCheckpointer("state.db")

The on-disk format is compatible — no migration. Restart your service, re-run the benchmark. For graphs with large state, this step alone can be the biggest single improvement.

Rollback: restore the original checkpointer line.

5. Add `@cached` to LLM call sites

from fast_langgraph import cached

@cached(max_size=2000)
def call_llm(prompt: str) -> str:
    return llm.invoke(prompt)

Then replace direct llm.invoke(...) calls in your nodes with call_llm(...). Run your test suite to confirm nothing depends on per-call freshness (e.g., streaming events, randomized output).

Check the cache stats after a benchmark run:

print(call_llm.cache_stats())
# {'hits': 148, 'misses': 62, 'size': 62}

If the hit rate is under 20%, the cache isn’t helping and you can remove the decorator. If it’s over 50%, you’re seeing material cost and latency savings.

Rollback: remove the @cached decorator.

6. Validate against your test suite

Run your full integration test suite with fast-langraph fully enabled. Pay attention to:

Checkpoint round-trips (save state, restart, resume) — should be bit-identical
Streaming event ordering — the shim does not affect it, but verify
Retry and error paths — Rust panics are not Python exceptions, though in practice fast-langraph raises Python RuntimeError on the boundary

If anything fails, bisect by selectively disabling components (unpatch the shim, revert the checkpointer). Open a GitHub issue with a reproduction — we treat compatibility failures as P0.

7. Measure everything and publish the numbers

After migration, re-capture the metrics from step 1:

p50 / p95 wall clock
Memory footprint
LLM cost per invocation
Any error rate changes

Put them in your release notes or internal doc. Future-you and your team will thank you for the receipts.

8. Monitor in production

Add the metrics to your dashboards. The shim has no runtime footprint you can easily monitor, but your existing wall clock and LLM cost metrics will tell the full story. If you see a regression, the rollback path is the same as it was in dev — remove the patch call and the checkpointer swap.

Typical results we’ve seen

Across our consulting engagements, a typical production LangGraph workload migrated through these 6 steps lands somewhere around:

3–8× end-to-end latency improvement
30–60% LLM cost reduction (if the workload had redundant calls)
20–40% memory footprint reduction (driven mostly by skipping deepcopy allocations)

Your mileage will vary. That’s what the profile in step 2 is for.

Need a second pair of eyes? We run audits.