A clean in-sample equity curve means almost nothing. Two tests separate real strategies from curve-fits: walk-forward analysis and Monte Carlo simulation. alphabench's Quant agent can run both on any completed backtest in one prompt.
1. Why these two tests
- Walk-forward answers: "would this strategy have worked if I'd traded it out-of-sample, retraining periodically as new data arrives?"
- Monte Carlo answers: "how much of the equity curve is luck? What does the distribution of outcomes look like under random trade reordering and bootstrap resampling?"
A strategy that survives both is a candidate for paper trading. A strategy that fails either should be rejected, not "improved" with more parameters.
2. Walk-forward analysis
After any backtest, ask:
"Run a walk-forward test with 6 folds, 70/30 train/test split, on this strategy."
The Quant agent calls walk_forward_test. You'll get:
- In-sample vs out-of-sample Sharpe per fold — the gap should be < 30%.
- Stability of optimal parameters across folds — wildly different optima per fold means the strategy isn't really optimized for anything.
- Stitched out-of-sample equity curve — the only equity curve a serious researcher should ever look at.
If out-of-sample Sharpe collapses below 0.5, throw the strategy out. No amount of re-tuning saves it.
3. Monte Carlo simulation
Run alongside walk-forward:
"Run a Monte Carlo simulation with 1000 paths using trade reshuffling and bootstrap resampling on the original trade log."
The Quant agent calls monte_carlo_test and returns:
- Distribution of final equity — the original equity curve should sit near the median, not in the top 5%.
- Probability of a max drawdown > X% — useful for capital allocation.
- Confidence interval on Sharpe — a wide interval (e.g. [-0.3, 1.5]) means your Sharpe is largely noise.
4. Reading both together
| Walk-forward | Monte Carlo | Verdict |
|---|---|---|
| OOS Sharpe ≈ IS Sharpe | Original near median | Real strategy. Deploy to paper. |
| OOS Sharpe much lower than IS | Original near top tail | Curve-fit. Reject. |
| OOS Sharpe ≈ IS Sharpe | Original near top tail | Lucky. Re-test on more data. |
| OOS Sharpe much lower than IS | Original near median | Underlying edge decayed. Reject. |
5. What the agents do behind the scenes
The Planner agent recognizes "validate" and "robustness" prompts and routes them to the Quant agent with the existing strategy DSL. No re-coding required — the same strategy runs through both tests on the same RaptorBT execution path.
6. Final gate before paper trading
A strategy that passes both tests still needs to survive the paper trading engine for at least 2 weeks against live ticks before going live with real capital. The agents do their job; market microstructure does the rest.