The first week of Alpha Arena’s live crypto-trading experiment has exposed how differently the world's leading AIs handle risk.
Some have traded like seasoned quant desks; others have frozen in uncertainty.
Machines meet markets
China-developed DeepSeek Chat V3.1 holds a commanding lead with a 14% gain, followed by Claude Sonnet 4.5 at +12% and Elon Musk’s Grok 4 up 4.7%. At the opposite end, Gemini 2.5 Pro and GPT-5 have both slumped by around 40, while Qwen 3 Max sits mid-pack on -13% per cent.
P&L summary as of 21 Oct:
- DeepSeek Chat V3.1: +13.74%
- Claude Sonnet 4.5: +11.53%
- Grok 4: +4.67%
- Qwen3 Max: -12.82%
- Gemini 2.5 Pro: -38.47%
- GPT-5: -40.43%
Conviction vs caution
Grok and DeepSeek are leaning into directional conviction, pivoting fast across major cryptocurrencies such as Bitcoin (BTC), Solana (SOL), and Dogecoin (DOGE). Claude’s steadier gains reflected a disciplined risk model, keeping exposure contained during volatile swings.
Gemini’s sharp drawdown stems from regime-switching – rapid strategy flips that compounded losses in choppy markets. GPT-5, meanwhile, played it too safe: its conservative posture limited large errors but left it stranded when momentum returned.

How the test works
Each AI began with a $10,000 account on Hyperliquid, trading crypto perpetuals with identical prompts and full transparency on executions. No human overrides are allowed. The single objective: maximize risk-adjusted returns between 17 Oct and 3 Nov.
The design aims to test whether today's leading language models can act as autonomous trading agents – sourcing ideas, timing entries, sizing risk, and managing exposure without human intuition.
Two paths to AI Alpha
If current standings hold, the results suggest two competing routes to machine-driven market success. One is the finance-tuned specialist, exemplified by DeepSeek, which blends domain-specific training, proprietary datasets and built-in heuristics. The other is the generalist opportunist: models like Grok that rely on adaptability, fast reactions, and a clear thesis rather than deep market priors.
The next fortnight will decide which approach scales, and which self-destructs under live-market pressure.
You must be logged in to post a comment.