Quickstart: enable fast-langraph in under a minute

This is the fastest way to try fast-langraph. You keep your existing LangGraph code. You flip one switch. You measure before and after.

1. Install

pip install fast-langgraph

Requires Python 3.9+. Works with any LangGraph version we’ve tested against (see compatibility — 85 of 88 upstream LangGraph tests pass against our shimmed implementation).

2. Enable the shim

At the top of your application entry point — before any other LangGraph imports:

import fast_langgraph
fast_langgraph.shim.patch_langgraph()

That’s it. Everything downstream imports a patched LangGraph: cached executor, Rust-backed apply_writes, faster channel updates.

Prefer environment variables? Set FAST_LANGGRAPH_AUTO_PATCH=1 and run your app normally. The library self-patches at import time without touching your code at all.

export FAST_LANGGRAPH_AUTO_PATCH=1
python your_app.py

3. Confirm it’s enabled

import fast_langgraph
fast_langgraph.shim.print_status()

You’ll see which hot paths are currently patched. If you don’t see ✓ next to the ones you expect, the shim is either running too late (after LangGraph has already been imported and cached) or an incompatible version is pinned.

4. Measure

Don’t take our word for it. Time a real invocation before and after:

import time
from langgraph.graph import StateGraph

graph = build_your_graph().compile()

t0 = time.perf_counter()
for _ in range(50):
    graph.invoke(sample_input)
print(f"50 invocations: {(time.perf_counter() - t0) * 1000:.1f} ms")

Run it first without the shim, then with. On realistic workloads with checkpointing enabled, you should see a ~2.8× improvement just from the automatic path. If your state is large, the gap widens further because apply_writes and checkpoint serialization dominate more of the wall clock.

What the shim does

The automatic path delivers two wins:

Component	Speedup	What it does
Executor caching	2.3×	Reuses `ThreadPoolExecutor` across invocations instead of rebuilding per call
Rust `apply_writes`	1.2×	Batch channel updates in native code

Combined, that’s ~2.8× for typical graph invocations.

Next steps

The shim is the onramp. The manual-mode components deliver the headline numbers:

Rust SQLite checkpointer — 5–6× checkpoint speedup, or up to 737× on large state
LLM response caching — 9.78× at 90% hit rate
Profiling bottlenecks — find what’s actually slow before adopting anything

Stuck? Read why fast-langraph exists or email us.