Why fast-langraph

LangGraph is a great framework for building AI agents. It is not tuned for the shape of a production workload. Three bottlenecks show up in every non-trivial deployment, and Python alone can't close them.

1. Python's `deepcopy` eats checkpoint latency

LangGraph checkpoints state between every super-step. With any non-trivial graph — messages, tool outputs, intermediate scratchpads — that state balloons fast. By the time it's 250 KB, Python's deepcopy takes over 200 ms per checkpoint. On a graph that checkpoints 50 times, you've burned 10 seconds on pure serialization overhead.

RustSQLiteCheckpointer serializes the same state in 0.28 ms — a 737× speedup that scales with state size. The bigger your state, the bigger the win.

2. Thread pool creation dominates per-invocation cost

LangGraph builds a fresh ThreadPoolExecutor every time you invoke a graph. We measured: 58% of per-invocation wall clock on short graphs is spent in executor setup. The graph hasn't even started running.

Our shim caches executors across invocations. You see a 2.3× speedup on the shim path alone, combining with faster apply_writes for an average ~2.8× end-to-end improvement on realistic workloads.

3. Graphs make the same LLM call over and over

Retry loops, branching, tool routing — graphs re-issue identical prompts constantly. On a RAG-style workload with 90% cache hit rate, a naive implementation blows ~10× more money and time than necessary.

Our @cached decorator is a one-line wrapper. Sub-microsecond lookup, content-addressed by arguments, configurable eviction. It drops onto your LLM call site and reduces both latency and spend.

The principle

We do not rewrite LangGraph. We identify hot paths that are already well-isolated behind interfaces — checkpointer, apply_writes, state merge — and swap them out for Rust implementations using PyO3. Everything else stays vanilla LangGraph. Your graph code, your tools, your retrievers, your prompts: untouched.

That's why fast-langraph passes 85 of 88 LangGraph tests out of the box, and why you can enable it with a single line and turn it off just as easily.

When not to use fast-langraph

Your graphs are tiny and run once per request. Python overhead won't be your bottleneck.
You're prototyping. Adopt later, when you hit the wall.
Your state is small and simple. The 737× gain is a function of state complexity — small dicts are already fast.

fast-langraph is for teams running LangGraph in production with real scale. If that's you, start here.

Why fast-langraph exists

1. Python's `deepcopy` eats checkpoint latency

2. Thread pool creation dominates per-invocation cost

3. Graphs make the same LLM call over and over

The principle

When not to use fast-langraph

Hit a LangGraph scaling wall?

Why fast-langraph exists

1. Python's deepcopy eats checkpoint latency

2. Thread pool creation dominates per-invocation cost

3. Graphs make the same LLM call over and over

The principle

When not to use fast-langraph

Hit a LangGraph scaling wall?

1. Python's `deepcopy` eats checkpoint latency