Why fast-langraph exists
LangGraph is a great framework for building AI agents. It is not tuned for the shape of a production workload. Three bottlenecks show up in every non-trivial deployment, and Python alone can't close them.
1. Python's deepcopy eats checkpoint latency
LangGraph checkpoints state between every super-step. With any non-trivial graph — messages, tool outputs, intermediate scratchpads — that state balloons fast. By the time it's 250 KB, Python's deepcopy takes over 200 ms per checkpoint. On a graph that checkpoints 50 times, you've burned 10 seconds on pure serialization overhead.
RustSQLiteCheckpointer serializes the same state in 0.28 ms — a 737× speedup that scales with state size. The bigger your state, the bigger the win.
2. Thread pool creation dominates per-invocation cost
LangGraph builds a fresh ThreadPoolExecutor every time you invoke a graph. We measured: 58% of per-invocation wall clock on short graphs is spent in executor setup. The graph hasn't even started running.
Our shim caches executors across invocations. You see a 2.3× speedup on the shim path alone, combining with faster apply_writes for an average ~2.8× end-to-end improvement on realistic workloads.
3. Graphs make the same LLM call over and over
Retry loops, branching, tool routing — graphs re-issue identical prompts constantly. On a RAG-style workload with 90% cache hit rate, a naive implementation blows ~10× more money and time than necessary.
Our @cached decorator is a one-line wrapper. Sub-microsecond lookup, content-addressed by arguments, configurable eviction. It drops onto your LLM call site and reduces both latency and spend.
The principle
We do not rewrite LangGraph. We identify hot paths that are already well-isolated behind interfaces — checkpointer, apply_writes, state merge — and swap them out for Rust implementations using PyO3. Everything else stays vanilla LangGraph. Your graph code, your tools, your retrievers, your prompts: untouched.
That's why fast-langraph passes 85 of 88 LangGraph tests out of the box, and why you can enable it with a single line and turn it off just as easily.
When not to use fast-langraph
- Your graphs are tiny and run once per request. Python overhead won't be your bottleneck.
- You're prototyping. Adopt later, when you hit the wall.
- Your state is small and simple. The 737× gain is a function of state complexity — small dicts are already fast.
fast-langraph is for teams running LangGraph in production with real scale. If that's you, start here.
Hit a LangGraph scaling wall?
We help production teams squeeze every bottleneck out of LangGraph — checkpoints, state, LLM costs, memory. Honest audits. Measurable fixes.