Quickstart: enable fast-langraph in under a minute
pip install fast-langgraph, add two lines at your entry point, rerun your graph. Typical production graphs see a ~2.8x end-to-end speedup with zero other changes.
This is the fastest way to try fast-langraph. You keep your existing LangGraph code. You flip one switch. You measure before and after.
1. Install
pip install fast-langgraph
Requires Python 3.9+. Works with any LangGraph version we’ve tested against (see compatibility — 85 of 88 upstream LangGraph tests pass against our shimmed implementation).
2. Enable the shim
At the top of your application entry point — before any other LangGraph imports:
import fast_langgraph
fast_langgraph.shim.patch_langgraph()
That’s it. Everything downstream imports a patched LangGraph: cached executor, Rust-backed apply_writes, faster channel updates.
Prefer environment variables? Set FAST_LANGGRAPH_AUTO_PATCH=1 and run your app normally. The library self-patches at import time without touching your code at all.
export FAST_LANGGRAPH_AUTO_PATCH=1
python your_app.py
3. Confirm it’s enabled
import fast_langgraph
fast_langgraph.shim.print_status()
You’ll see which hot paths are currently patched. If you don’t see ✓ next to the ones you expect, the shim is either running too late (after LangGraph has already been imported and cached) or an incompatible version is pinned.
4. Measure
Don’t take our word for it. Time a real invocation before and after:
import time
from langgraph.graph import StateGraph
graph = build_your_graph().compile()
t0 = time.perf_counter()
for _ in range(50):
graph.invoke(sample_input)
print(f"50 invocations: {(time.perf_counter() - t0) * 1000:.1f} ms")
Run it first without the shim, then with. On realistic workloads with checkpointing enabled, you should see a ~2.8× improvement just from the automatic path. If your state is large, the gap widens further because apply_writes and checkpoint serialization dominate more of the wall clock.
What the shim does
The automatic path delivers two wins:
| Component | Speedup | What it does |
|---|---|---|
| Executor caching | 2.3× | Reuses ThreadPoolExecutor across invocations instead of rebuilding per call |
Rust apply_writes | 1.2× | Batch channel updates in native code |
Combined, that’s ~2.8× for typical graph invocations.
Next steps
The shim is the onramp. The manual-mode components deliver the headline numbers:
- Rust SQLite checkpointer — 5–6× checkpoint speedup, or up to 737× on large state
- LLM response caching — 9.78× at 90% hit rate
- Profiling bottlenecks — find what’s actually slow before adopting anything
Stuck? Read why fast-langraph exists or email us.