Evidence

Validation Suite

Eighteen tests. Six markets. Three leverage levels. Multiple data sources. Every major critic-concern addressed with documented results.

Capstone Result
🔥 Worst-case scenario

When leveraged NDX lost 88%, the classifier compounded +31.5% per year.

The worst 14-year window for leveraged Nasdaq exposure in the post-WWII era was January 2000 through December 2013. Buy-and-hold on the UOPIX 2× NDX mutual fund lost 87.6% over that period. The JEDI classifier, running on the exact same real mutual fund data, produced +31.5% CAGR with Sharpe 1.71 and a max drawdown of 17% — and was positive in every single one of the worst bear years.

JEDI CAGR
+31.5%
Buy & Hold CAGR
−13.9%
Sharpe Ratio
1.71
Profit Factor
1.85

This is the most hostile environment for leveraged equity strategies.

Positive in 2000 (+27%), 2001 (+11%), 2002 (+9%), and 2008 (+1%). $100k starting capital ended at $4.59M on the JEDI path; the same $100k in UOPIX buy-and-hold ended at $12.4k. Profit factor in this sideways period (1.85) is higher than in the 2010–2026 bull period (1.62) — decisively refuting the “bull market gravity” hypothesis.

Works across 6 markets and 3 leverage levels
Holds through worst-case bear environments
Robust across parameter perturbations
Validated on strictly unseen data
Convinced by the capstone?

Skip the deep dive — start free with the same system behind these results.

Start free →

Validation Framework

Each section tests a different failure mode — across markets, crises, and unseen data.

Cross-Asset Validation

8 tests

Does the system generalize beyond a single market?

  • Validated across 6 global markets (NDX, SPX, DJIA, Russell, Nikkei, DAX)
  • Same parameter set applied across leverage levels — no per-market retuning
  • Consistent risk-adjusted returns — positive alpha on every test
Cross-asset SPX (unlevered) ★ Key validation
Does the same classifier work on S&P 500 via SPY / SH?
Same system parameters, same logic, SPX data instead of NDX. 20-year window (2006–2026) including the 2008 GFC.
21.2%CAGR
2.43Sharpe
−8.7%Max DD
Cross-asset SPX (3× leveraged)
Does it work on 3× S&P 500 via UPRO / SPXU?
2009–2026. Profit factor 1.65 — slightly higher than NDX's 1.63. Parameters sit in a robust region of the space, not an NDX-specific peak.
45.5%CAGR
2.08Sharpe
−22.1%Max DD
NDX 1× (QQQ / PSQ)
Does the system work on unleveraged Nasdaq with the same parameters?
2006–2026. First validation of the system on 1× Nasdaq exposure. Sharpe 2.67 with max drawdown 15% of buy-and-hold's 53%.
29.0%CAGR
2.67Sharpe
−8.0%Max DD
DJIA 1× (DIA / DOG)
Does the system work on the Dow Jones?
2006–2026. Same parameters applied to DIA / DOG. Beats buy-and-hold on both CAGR (+5.8pp) and drawdown (15% of B&H DD). Profit factor 1.10 — passes Sharpe and DD criteria but per-trade economics are at the threshold.
15.7%CAGR
1.72Sharpe
−8.0%Max DD
DJIA 3× (UDOW / SDOW)
Does it work on 3× Dow Jones?
2010–2026, 16 years. Profit factor 1.30 — clean pass on all four criteria. Drawdown contained to 28% of buy-and-hold's 80% DD — the largest absolute outperformance in the new test suite.
33.4%CAGR
1.65Sharpe
−22.5%Max DD
Russell 2000 1× (IWM / RWM)
Does it work on small caps?
2007–2026. Weakest US test — Sharpe 1.01 just clears the threshold; profit factor 1.03 borderline. Honest framing: the classifier uses VIX (SPX-derived volatility) which correlates weakly with small-cap-specific vol. Still beats B&H on CAGR and drawdown.
12.9%CAGR
1.01Sharpe
−12.4%Max DD
Russell 2000 3× (TNA / TZA)
Does it work on 3× small caps?
2008–2026. Sharpe 0.98 just below 1.0, but profit factor 1.56 is strong — high per-trade economics, high annualized volatility. Same small-cap challenge as the 1× test, but 3× leverage amplifies the per-trade edge. Drawdown 30% of buy-and-hold's 88%.
24.4%CAGR
0.98Sharpe
−26.4%Max DD
Non-US markets (Nikkei + DAX)
Does it work in Japan and Germany with US-calibrated parameters?
1996–2026, 30 years on Nikkei/EWJ and DAX/EWG. Lower Sharpe than US tests — parameters are US-calibrated — but both produce positive alpha. Not the “collapses on a different asset” pattern of overfitting.
10.5% / 13.6%CAGR (Nikkei / DAX)
+9.4 / +10.0ppAlpha vs B&H

Stress Testing

3 tests + 2008 cross-section

Real-world crisis validation — not simulated scenarios.

  • Positive through dot-com (2000–2002), GFC (2008), and 2022 bear
  • 27 years of real mutual fund data (UOPIX) — not synthetic
  • 2008 case study: 8 of 8 tests positive or near-flat
2022 isolated (the worst year for TQQQ)
What happened in 2022, when TQQQ lost 79%?
The classifier recognized the bear regime and stayed mostly in cash. Capital preservation: strong pass. Alpha-in-bear: weak (profit factor 0.51). Honest framing: “regime-aware capital preservation,” not “bear-market alpha.”
−8.8%JEDI return
−79%TQQQ B&H
−13.2%Max DD
Real 2× S&P MF (ULPIX / URPIX) · 28 years
Does the system work on real leveraged mutual fund data back to 1997?
1997–2026. Real, not synthetic, 2× leveraged S&P 500 mutual fund data. Positive in 2000, 2001, 2002, and 2008. Max DD 22% vs buy-and-hold's 90% drawdown.
24.1%CAGR
1.52Sharpe
−22%Max DD
Real 2× NDX MF (UOPIX / USPIX) · 27 years 🔥 Real-data stress
Does it work on real 2× Nasdaq mutual fund data through the dot-com bust?
1998–2026. 27 years of real data, including dot-com (2000–2002) and the GFC. Positive in every worst year. Strongest full-period claim: Sharpe 2.01 at 2× leverage.
38.8%CAGR
2.01Sharpe
−17%Max DD
Case study

2008 Financial Crisis · 8 of 8 positive or near-flat

Year-2008 results across all 8 cross-asset / leverage tests — year-by-year evidence the GFC outcome wasn't accidental.

Instrument Data Source 2008 Return
SPY unleveredReal -1× ETF+1.9%
NDX 3×Synthetic+1.3%
ULPIX 2× S&PReal mutual fund+6.6%
UOPIX 2× NDXReal mutual fund+1.1%
Nikkei / EWJReal ETF−3.3%
DAX / EWGReal ETF−2.7%
ULPIX 2× S&P (2000–2013)Real mutual fund+6.6%
UOPIX 2× NDX (2000–2013)Real mutual fund+1.1%

Benchmark comparison: unlevered SPY −36.8% · UOPIX buy-and-hold −85% · TQQQ (backfilled synthetic) −90%+.

Methodology

2 tests

Ensuring results are not overfit.

  • 95% of alpha from the original 7 strategies (S8–S12 not load-bearing)
  • Parameters flat across ±30% perturbation on 3 of 4 dimensions
  • No jagged-peak signature — not curve-fit
Strategy ablation S1–S7
Do the later strategies (S8–S12) carry real weight, or are they data-mining additions?
Ran with only the original 7 strategies enabled. 95% of the alpha is in the original 7. If S8–S12 were overfit, removing them would hurt more.
73.6%CAGR
2.52Sharpe
1.65Profit Factor
Parameter sensitivity (“jiggle”)
Are the parameters hill-climbed to a jagged peak?
4 parameters × 5 values each = 20 runs. 3 of 4 parameters are essentially flat across ±30% perturbation. The 4th shows a one-sided structural cliff with a wide stable plateau — not a jagged peak signature.
3 of 4Params flat
0 of 4Show overfit signature

Data Integrity

3 tests

Validating assumptions and inputs.

  • CAGR rises when synthetic pre-2010 data is stripped (not falls)
  • Robust across realistic transaction costs (5–200 bps slippage)
  • 80–120× safety margin on slippage breaking point
NDX real-only (strip synthetic pre-2010)
Is the 58.2% CAGR inflated by synthetic pre-2010 data?
The opposite is true. Stripping synthetic data raised CAGR from 58% to 77%. The synthetic pre-2010 period includes bear markets that drag the backtest, not inflate it.
77.5%CAGR
2.56Sharpe
Real ETF only2010–2026
Friction sensitivity (5 / 50 / 100 / 200 bps)
Is the CAGR robust to realistic transaction costs?
CAGR stays in the 77–80% band across all four slippage levels. But profit factor collapses from 1.63 → 0.42 — we report both, honestly.
77–80%CAGR range
1.63 → 0.42Profit Factor
Slippage breaking point
At what slippage does the strategy finally break?
Crossover vs buy-and-hold at 4000–6000 bps (40–60% per trade) — absurdly high. Realistic friction has 80–120× safety margin.
4000–6000 bpsBreaking point
80–120×Safety margin
Second strongest evidence

Out-of-Sample Validation

2 tests

Does performance hold on unseen data?

  • Strictly post-parameter-lock window: +79% return in 12 months
  • Period 1 (1985–2015, 31 years standalone): Sharpe 1.81, profit factor 2.51
  • Sharpe 3.01 on the strictly unseen post-lock data
Period decomposition (1985–2015 vs 2016–2026)
Are the results driven by one favorable decade?
Period 1 (31 years, includes 3 bear markets): 51.1% CAGR, Sharpe 1.81, profit factor 2.51. Period 2 (10 years bull): 78.5% CAGR. The earlier period, standalone, is already a strong institutional result.
51.1%P1 CAGR
78.5%P2 CAGR
2.51P1 Profit Factor
True out-of-sample (post-parameter-lock) 🧪 Out-of-sample proof
On strictly post-lock data, does the system still work?
Apr 2025 → Mar 2026, the only strictly out-of-sample window. 79% total return in 12 months with Sharpe 3.01. Sample size is small (35 trades) but direction is strongly positive.
+79%Total return
3.01Sharpe
12 monthsPeriod

Access the system behind these validated results

See today's system positioning across markets and leverage levels — powered by the same framework validated across all 18 tests above.

Get Started Free How the System Works →

All figures above are hypothetical and derived from historical simulation on the data sources named in each test. Past performance, whether simulated or real, is not a guarantee of future results. Leveraged ETFs and leveraged mutual funds involve material risk including volatility decay, tracking error, and potential total loss of principal. Review each product's prospectus before investing. JEDI AI is not a registered investment advisor; this page is for informational purposes only.

JEDI AI provides algorithmic signals and user-authorized trade instructions. It does not provide investment advice or manage client funds.