
Company News
Socket Has Acquired Secure Annex
Socket has acquired Secure Annex to expand extension security across browsers, IDEs, and AI tools.
chatroutes-autobranch
Advanced tools
Intelligent branch exploration for LLM-powered applications with conversation analysis and intelligent routing
Controlled branching generation for LLM applications
Modern LLM applications often need to explore multiple reasoning paths (tree-of-thought, beam search, multi-agent systems) while staying usable and affordable. chatroutes-autobranch provides clean, standalone primitives for:
Key Features:
Perfect for first-time users! Learn the fundamentals in 5 minutes:
No setup required - runs entirely in your browser!
Analyze text for decision points! Interactive branch detection:
See it in action with a real LLM! Complete creative writing assistant:
Install:
pip install chatroutes-autobranch
Basic Usage:
from chatroutes_autobranch import BranchSelector, Candidate
from chatroutes_autobranch.config import load_config
# Load config (or use dict/env vars)
selector = BranchSelector.from_config(load_config("config.yaml"))
# Define parent and candidate branches
parent = Candidate(id="root", text="Explain photosynthesis simply")
candidates = [
Candidate(id="c1", text="Start with sunlight absorption"),
Candidate(id="c2", text="Begin with glucose production"),
Candidate(id="c3", text="Explain chlorophyll's role"),
]
# Select best branches (applies beam β novelty β entropy pipeline)
result = selector.step(parent, candidates)
print(f"Kept: {[c.id for c in result.kept]}")
print(f"Entropy: {result.metrics['entropy']['value']:.2f}")
print(f"Should continue: {result.metrics['entropy']['continue']}")
Config (config.yaml):
beam:
k: 3 # Keep top 3 by score
weights: {confidence: 0.4, relevance: 0.3, novelty_parent: 0.2}
novelty:
method: cosine # or 'mmr' for Maximal Marginal Relevance
threshold: 0.85
entropy:
min_entropy: 0.6 # Stop if diversity drops below 60%
embeddings:
provider: openai
model: text-embedding-3-large
Analyze text to identify decision points before generating branches:
from chatroutes_autobranch import BranchExtractor
# Analyze LLM response for branch points
text = """
Backend options:
1. Flask - lightweight
2. FastAPI - modern
3. Django - full-featured
Database: Postgres or MySQL
"""
extractor = BranchExtractor()
branch_points = extractor.extract(text)
print(f"Found {len(branch_points)} decision points")
# Output: Found 2 decision points
print(f"Max paths: {extractor.count_max_leaves(branch_points)}")
# Output: Max paths: 6 (3 backends Γ 2 databases)
Features:
Use Cases:
Problem: Exploring multiple LLM reasoning paths (e.g., tree-of-thought) quickly becomes:
Solution: chatroutes-autobranch gives you:
Result: Controlled, efficient tree exploration with predictable costs.
| Scenario | Configuration | Benefit |
|---|---|---|
| Branch Analysis | BranchExtractor only | Analyze text for decision points, count paths (no generation) |
| Tree-of-Thought Reasoning | K=5, cosine novelty, entropy stopping | Explore diverse reasoning paths without explosion |
| Multi-Agent Debate | K=3, MMR novelty (Ξ»=0.3) | Select diverse agent perspectives, avoid redundancy |
| Code Generation | K=4, high relevance weight | Generate varied solutions, prune duplicates |
| Creative Writing | K=8, low novelty threshold | High diversity, explore creative space |
| Factual Q&A | K=2, strict budget | Focus on accuracy, minimal branching |
Two-Phase Workflow:
Phase 1: Branch Detection (Optional, Pre-Analysis)
ββββββββββββββββββββββββββββββββββββββββββββββββββ
Text β BranchExtractor β Branch Points β Count Max Paths
β Statistics
β Decision: Generate or Skip?
Phase 2: Branch Selection (Core Pipeline)
ββββββββββββββββββββββββββββββββββββββββββ
Raw Candidates (N)
β
1. Scoring (composite: confidence + relevance + novelty + intent + reward)
β
2. Beam Selection (top K by score, deterministic tie-breaking)
β
3. Novelty Filtering (prune similar via cosine/MMR)
β
4. Entropy Check (compute diversity, decide if should continue)
β
5. Result (kept + pruned + metrics)
Pluggable Components:
All components use Protocol (duck typing) β swap any part without touching others.
Minimal:
pip install chatroutes-autobranch
With extras:
# FastAPI service (for TypeScript/other languages)
pip install chatroutes-autobranch[service]
# HuggingFace local embeddings
pip install chatroutes-autobranch[hf]
# FAISS for large-scale similarity (1000+ candidates)
pip install chatroutes-autobranch[faiss]
# All features
pip install chatroutes-autobranch[all]
π Full Specification β Complete API reference, algorithms, examples, and troubleshooting
Key Sections:
from collections import deque
import time
# User provides LLM generation function
def my_llm_generate(parent: Candidate, n: int) -> list[Candidate]:
# Your LLM call here (OpenAI, Anthropic, etc.)
responses = llm_api.generate(parent.text, n=n)
return [Candidate(id=f"{parent.id}_{i}", text=r) for i, r in enumerate(responses)]
# Setup
selector = BranchSelector.from_config(load_config("config.yaml"))
budget_manager = BudgetManager(Budget(max_nodes=50, max_tokens=20000))
# Tree exploration
queue = deque([root_candidate])
while queue:
current = queue.popleft()
children = my_llm_generate(current, n=5)
# Check budget before selection
if not budget_manager.admit(n_new=5, est_tokens=1000, est_ms=2000):
break
# Select best branches
result = selector.step(current, children)
budget_manager.update(actual_tokens=1200, actual_ms=1800)
# Continue with kept candidates
queue.extend(result.kept)
# Stop if entropy is low (converged)
if not result.metrics["entropy"]["continue"]:
break
from chatroutes_autobranch import Scorer, Candidate, ScoredCandidate
class DomainScorer(Scorer):
def score(self, parent: Candidate, candidates: list[Candidate]) -> list[ScoredCandidate]:
scored = []
for c in candidates:
# Custom logic: prefer longer, detailed responses
detail_score = min(len(c.text) / 1000, 1.0)
scored.append(ScoredCandidate(id=c.id, text=c.text, score=detail_score))
return scored
# Use in pipeline
beam = BeamSelector(k=3, scorer=DomainScorer())
selector = BranchSelector(beam, novelty, entropy, budget)
# server.py
from fastapi import FastAPI
from chatroutes_autobranch import BranchSelector
from chatroutes_autobranch.config import load_config_from_file
app = FastAPI()
_config = load_config_from_file("config.yaml")
@app.post("/select")
async def select(parent: dict, candidates: list[dict]):
# Create fresh selector per request (thread-safe)
selector = BranchSelector.from_config(_config)
result = selector.step(
Candidate(**parent),
[Candidate(**c) for c in candidates]
)
return {
"kept": [{"id": c.id, "score": c.score} for c in result.kept],
"metrics": result.metrics
}
# Run: uvicorn server:app
TypeScript client:
const response = await fetch('http://localhost:8000/select', {
method: 'POST',
body: JSON.stringify({ parent, candidates })
});
const { kept, metrics } = await response.json();
LangChain:
from langchain.chains import LLMChain
from chatroutes_autobranch import Candidate, BranchSelector
def generate_and_select(query: str, chain: LLMChain, selector: BranchSelector):
# Generate N candidates via LangChain
responses = chain.generate([{"query": query}] * 5)
candidates = [Candidate(id=f"c{i}", text=r.text) for i, r in enumerate(responses.generations[0])]
# Select best
parent = Candidate(id="root", text=query)
result = selector.step(parent, candidates)
return result.kept
LlamaIndex: Similar pattern using QueryEngine.query() for generation
Raw APIs (OpenAI, Anthropic): See multi-generation example
Benchmarks (M1 Max, OpenAI embeddings):
| Candidates | Beam K | Latency (p50) | Bottleneck |
|---|---|---|---|
| 10 | 3 | 240ms | Embedding API |
| 50 | 5 | 520ms | Embedding API |
| 100 | 10 | 1.1s | Novelty O(NΒ²) |
| 500 | 10 | 4.2s | Use FAISS |
Optimization tips:
batch_size: 64 in config)Setup:
git clone https://github.com/chatroutes/chatroutes-autobranch
cd chatroutes-autobranch
pip install -e .[dev]
Run tests:
pytest tests/
pytest tests/ -v --cov=chatroutes_autobranch # With coverage
Type checking:
mypy src/
Formatting:
black src/ tests/
ruff check src/ tests/
Benchmarks:
pytest bench/ --benchmark-only
We welcome contributions! Please see our contributing guidelines.
Areas we'd love help with:
How to contribute:
git checkout -b feature/amazing-feature)Q: Do I need ChatRoutes cloud to use this? A: No. This library is standalone and has zero cloud dependencies. Use it with any LLM provider.
Q: Can I use this with TypeScript/JavaScript? A: Yes. Run the FastAPI service and call via HTTP. Native TS SDK planned for v2.0.0.
Q: How do I choose beam width K?
A: Start with K=3-5. Use budget formula: K β (budget/tokens_per_branch)^(1/depth). See tuning guide.
Q: What if all candidates get pruned by novelty? A: Lower threshold (e.g., 0.75) or switch to MMR. See troubleshooting.
Q: Is this deterministic? A: Yes, with fixed random seeds and deterministic tie-breaking. See tests.
MIT License - see LICENSE file for details.
Inspired by research in beam search, diverse selection (MMR, DPP), and LLM orchestration patterns. Built to be practical, swappable, and friendly for contributors.
Special thanks to the open-source community for tools and inspiration: LangChain, LlamaIndex, HuggingFace Transformers, FAISS, and the broader LLM ecosystem.
Built with β€οΈ by the ChatRoutes team. Open to the community.
FAQs
Intelligent branch exploration for LLM-powered applications with conversation analysis and intelligent routing
We found that chatroutes-autobranch demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Socket has acquired Secure Annex to expand extension security across browsers, IDEs, and AI tools.

Research
/Security News
Socket is tracking cloned Open VSX extensions tied to GlassWorm, with several updated from benign-looking sleepers into malware delivery vehicles.

Product
Reachability analysis for PHP is now available in experimental, helping teams identify which vulnerabilities are actually exploitable.