Inside alphabench: The Multi-Agent System That Runs Quant Research

How alphabench spawns specialized AI agents — Planner, Quant, Researcher, Diagnostician — to research Indian markets, backtest strategies, and diagnose trades.

May 3, 2026 alphabench team

ai agentsarchitecturemulti-agentagenticquant researchllm trading

If you describe alphabench in one sentence, it's this: a research platform that spawns specialized AI agents to do quantitative analysis on Indian markets. This post explains what that means — the agents that exist, who runs when, and why a multi-agent system beats a single LLM call for serious quant work.

Why multi-agent, not single-prompt

A naive approach to "AI for trading" is: stuff a 32k-token system prompt with every tool, hand it to a single LLM, hope it picks the right one. This breaks in three places.

Tool count exceeds context utility. alphabench has 50+ tools across 9 domains — market data, equity backtesting, derivatives backtesting, futures, options utilities, discovery, fundamentals, trade forensics, deployment. A single agent has to reason about all of them on every turn. Latency and accuracy both suffer.
Phases need different prompts. Research-phase reasoning ("what universe should I pick?") and execution-phase reasoning ("what's the right backtest tool?") are different cognitive jobs. Cramming them into one prompt blunts both.
Prompt caching prefers stable prefixes. A small, stable agent prompt with a tight tool list caches well across turns. A monolithic prompt that mutates per-task does not.

So we run specialized agents with focused tool surfaces, coordinated by a top-level Planner that decides who runs and when.

The agents

Planner. The orchestrator. It reads the conversation, decides which specialist to invoke, and delegates via the delegate_to tool. The Planner itself does no execution and no data fetching — its only job is routing.
Researcher. Read-only discovery. Fundamentals (get_instrument_fundamentals, find_instruments, scan_universe), market quotes, regime detection, financial research, strategy memory recall. About 10 tools.
Quant. Execution. The largest tool surface (~30 tools): equity, basket, options, pairs, futures, multi-instrument, intraday options, adaptive and rolling strategies, parameter sweeps, walk-forward, Monte Carlo. This is the agent that calls RaptorBT.
Diagnostician (a.k.a. trade doctor). Forensic analysis of losing trades — per-trade indicator context, MAE/MFE curves, stop-loss simulation across the portfolio, filter ranking. One tool, deeply specialized: diagnose_trades.
Critic. Validation and screening. Catches obviously broken strategies before they reach the user.
Reporter. Narrative summarization of long results.

All agents are powered by Gemini 2.5 Flash Lite, with prompt caching tuned to keep the per-agent prefix stable across turns. They share message history but each agent only sees the tools it owns.

A typical conversation

You type:

"Find Indian large-caps with ROE > 18, backtest a momentum strategy on them, then tell me why the worst trades lost."

The trace looks like this:

Planner receives the message. It plans three subtasks: screen, backtest, diagnose.
Planner → Researcher via delegate_to. The Researcher calls find_instruments with ROE > 18 filter, returns a 47-stock universe.
Planner → Quant via delegate_to, with the screened universe in context. The Quant agent calls backtest_basket with a momentum DSL, RaptorBT runs the backtest in 8 seconds, returns trades + metrics.
Planner → Diagnostician via delegate_to. The Diagnostician calls diagnose_trades against the just-completed trade log, returns a per-trade forensic panel.
Planner synthesizes: a one-paragraph summary, the equity curve, the worst-trade analysis, and the suggested filters.

You see one coherent answer. Behind the scenes, four agents collaborated, each running on its own tight prompt with its own tools.

Why this matters for users

Three concrete payoffs:

Truthful answers. Each agent's tool list maps cleanly to what it can actually do. Hallucinated tools and made-up symbol formats are largely impossible.
Faster turns. Smaller tool lists mean smaller prompts, mean better cache hits, mean lower per-token latency.
Composable workflows. New capabilities ship as either new tools (added to an existing agent) or new agents (added to the Planner's roster). The system grows without degrading.

Where the AI ends and the deterministic engine begins

A subtle but important detail: AI agents don't simulate trades. They translate your intent into a strategy DSL, hand it to RaptorBT (our open-source Rust backtest engine), and return what RaptorBT computed. The same DSL runs in production paper and live trading. There is no LLM in the execution path. This is why backtests are reproducible, deterministic, and trustworthy at the millisecond level.

RaptorBT is open-source on PyPI and GitHub →

Try it

Want to see the agents work end-to-end? Pick any of these guides — each walks through a real workflow on real Indian market data:

Or just open a chat and describe a trading idea. The Planner takes it from there.