Migrating from vanilla LangGraph to fast-langraph
Profile first. Enable the shim. Swap the checkpointer. Add @cached. Run your test suite. Measure before and after every step. A migration takes half a day and is fully reversible.
This is the checklist we give teams when they’re ready to adopt fast-langraph in production. It’s intentionally cautious — nothing irreversible happens until step 6, and by then you’ll have numbers proving each change earned its place.
0. Prerequisites
- Python 3.9+
- Any LangGraph version (1.0.x tested; older versions generally work)
- A representative workload you can run repeatedly (staging deployment, captured trace, or benchmark harness)
- Permission to merge your own PRs
1. Baseline: measure current performance
Before touching anything, capture numbers. You want:
- p50 and p95 wall clock on a representative invocation
- Total memory footprint of a typical graph run
- LLM API cost per invocation (if applicable)
- Checkpoint size if you’re using persistence
import time, tracemalloc
tracemalloc.start()
t0 = time.perf_counter()
result = graph.invoke(sample_input)
dt = (time.perf_counter() - t0) * 1000
peak = tracemalloc.get_traced_memory()[1] / 1024
print(f"{dt:.1f} ms, peak {peak:.0f} KB")
Write these down. You will compare against them after every step.
2. Profile: know where time goes
pip install fast-langgraph
Install the library but don’t enable anything yet:
from fast_langgraph.profiler import GraphProfiler
profiler = GraphProfiler()
with profiler.profile_run():
graph.invoke(sample_input)
profiler.print_report()
Note the top 3 wall-clock contributors. This determines which components you’ll adopt and in what order. See profiling bottlenecks for interpretation.
3. Adopt the shim
import fast_langgraph
fast_langgraph.shim.patch_langgraph()
Add this at the very top of your entry point. Run your test suite. Run your benchmark again. Compare wall clock against your baseline — you should see ~2.8× if your graph has meaningful executor and channel overhead.
Rollback: remove the two lines or call fast_langgraph.shim.unpatch_langgraph().
4. Swap the checkpointer
If checkpointing showed up in your profile, do this next:
from fast_langgraph import RustSQLiteCheckpointer
# before: checkpointer = SqliteSaver.from_conn_string("state.db")
checkpointer = RustSQLiteCheckpointer("state.db")
The on-disk format is compatible — no migration. Restart your service, re-run the benchmark. For graphs with large state, this step alone can be the biggest single improvement.
Rollback: restore the original checkpointer line.
5. Add @cached to LLM call sites
from fast_langgraph import cached
@cached(max_size=2000)
def call_llm(prompt: str) -> str:
return llm.invoke(prompt)
Then replace direct llm.invoke(...) calls in your nodes with call_llm(...). Run your test suite to confirm nothing depends on per-call freshness (e.g., streaming events, randomized output).
Check the cache stats after a benchmark run:
print(call_llm.cache_stats())
# {'hits': 148, 'misses': 62, 'size': 62}
If the hit rate is under 20%, the cache isn’t helping and you can remove the decorator. If it’s over 50%, you’re seeing material cost and latency savings.
Rollback: remove the @cached decorator.
6. Validate against your test suite
Run your full integration test suite with fast-langraph fully enabled. Pay attention to:
- Checkpoint round-trips (save state, restart, resume) — should be bit-identical
- Streaming event ordering — the shim does not affect it, but verify
- Retry and error paths — Rust panics are not Python exceptions, though in practice fast-langraph raises Python
RuntimeErroron the boundary
If anything fails, bisect by selectively disabling components (unpatch the shim, revert the checkpointer). Open a GitHub issue with a reproduction — we treat compatibility failures as P0.
7. Measure everything and publish the numbers
After migration, re-capture the metrics from step 1:
- p50 / p95 wall clock
- Memory footprint
- LLM cost per invocation
- Any error rate changes
Put them in your release notes or internal doc. Future-you and your team will thank you for the receipts.
8. Monitor in production
Add the metrics to your dashboards. The shim has no runtime footprint you can easily monitor, but your existing wall clock and LLM cost metrics will tell the full story. If you see a regression, the rollback path is the same as it was in dev — remove the patch call and the checkpointer swap.
Typical results we’ve seen
Across our consulting engagements, a typical production LangGraph workload migrated through these 6 steps lands somewhere around:
- 3–8× end-to-end latency improvement
- 30–60% LLM cost reduction (if the workload had redundant calls)
- 20–40% memory footprint reduction (driven mostly by skipping deepcopy allocations)
Your mileage will vary. That’s what the profile in step 2 is for.
Need a second pair of eyes? We run audits.