The Rust performance
layer for LangGraph
Drop-in accelerators that make production LangGraph graphs up to 737× faster at checkpoint serialization and 2.8× faster end-to-end. Full API compatibility. One line to enable.
Measured performance
Benchmarks run on Python 3.12 / Linux x86_64. See /benchmarks for the full report and reproduction instructions.
Production LangGraph has three real bottlenecks
LangGraph is great for building agents. But production workloads hit the same walls — every time. We profiled them, isolated them, and rewrote the hot paths in Rust.
Python deepcopy collapses on large state
Serializing 235 KB of graph state through Python's deepcopy takes 206 ms. Through RustSQLiteCheckpointer, the same operation takes 0.28 ms.
58% of time spent recreating thread pools
LangGraph builds a new ThreadPoolExecutor per invocation. Our shim caches them, eliminating the largest single source of per-invocation overhead.
Repeated prompts waste API spend
Graphs loop, retry, and branch. The same prompts get answered multiple times. Our @cached decorator is a 10× speedup at 90% hit rate — and a direct cost reduction.
One line. Zero code changes.
Automatic mode patches LangGraph transparently at import time.
# 1. install
$ pip install fast-langgraph
# 2. enable (anywhere before you build your graph)
import fast_langgraph
fast_langgraph.shim.patch_langgraph()
# 3. use LangGraph exactly as you already do
from langgraph.graph import StateGraph
# → ~2.8x end-to-end speedup, zero API changes Guides
Production-tested walkthroughs from quickstart to deep optimization.
Cache LLM calls with @cached for a 10x speedup
LangGraph graphs re-issue the same LLM prompts constantly. The fast-langraph @cached decorator drops onto your LLM call sites and eliminates redundant API spend.
Find LangGraph bottlenecks with GraphProfiler
Before you adopt fast-langraph or any other optimization, measure. The GraphProfiler adds ~1.6 μs of overhead per operation and tells you exactly where your wall clock goes.
Quickstart: enable fast-langraph in under a minute
Install fast-langraph, flip the shim on, and measure your first speedup. No code changes to your existing LangGraph application.
Technical articles
Deep-dives on what's slow in LangGraph and how we fixed it.
Why Python's deepcopy kills LangGraph at scale
The 737x speedup on checkpoint serialization isn't magic. It's the direct consequence of what Python's deepcopy actually does — and doesn't do — to your agent state.
Scaling LangGraph in production: the three real bottlenecks
Production LangGraph workloads hit three predictable bottlenecks: checkpointing, executor churn, and LLM redundancy. An architect-level look at the cost math at each stage.
Executor churn: the 58% problem in LangGraph invocations
Most LangGraph invocation overhead isn't in your nodes or your channels. It's in ThreadPoolExecutor construction on every single call — 58% of wall clock on short graphs.
Hit a LangGraph scaling wall?
We help production teams squeeze every bottleneck out of LangGraph — checkpoints, state, LLM costs, memory. Honest audits. Measurable fixes.