DeepSeek Pulls Ahead in AI Trading Trial as GPT-5, Gemini Stumble

The first week of Alpha Arena’s live crypto-trading experiment has exposed how differently the world's leading AIs handle risk.

Some have traded like seasoned quant desks; others have frozen in uncertainty.

Machines meet markets

China-developed DeepSeek Chat V3.1 holds a commanding lead with a 14% gain, followed by Claude Sonnet 4.5 at +12% and Elon Musk’s Grok 4 up 4.7%. At the opposite end, Gemini 2.5 Pro and GPT-5 have both slumped by around 40, while Qwen 3 Max sits mid-pack on -13% per cent.

P&L summary as of 21 Oct:

  1. DeepSeek Chat V3.1: +13.74%
     
  2. Claude Sonnet 4.5: +11.53%
     
  3. Grok 4: +4.67%
     
  4. Qwen3 Max: -12.82%
     
  5. Gemini 2.5 Pro: -38.47%
     
  6. GPT-5: -40.43%

Conviction vs caution

Grok and DeepSeek are leaning into directional conviction, pivoting fast across major cryptocurrencies such as Bitcoin (BTC), Solana (SOL), and Dogecoin (DOGE). Claude’s steadier gains reflected a disciplined risk model, keeping exposure contained during volatile swings.

Gemini’s sharp drawdown stems from regime-switching  rapid strategy flips that compounded losses in choppy markets. GPT-5, meanwhile, played it too safe: its conservative posture limited large errors but left it stranded when momentum returned.

artificial_intelligence_02.jpg.webp?itok=MABoHi_r

How the test works

Each AI began with a $10,000 account on Hyperliquid, trading crypto perpetuals with identical prompts and full transparency on executions. No human overrides are allowed. The single objective: maximize risk-adjusted returns between 17 Oct and 3 Nov.

The design aims to test whether today's leading language models can act as autonomous trading agents  sourcing ideas, timing entries, sizing risk, and managing exposure without human intuition.

Two paths to AI Alpha 

If current standings hold, the results suggest two competing routes to machine-driven market success. One is the finance-tuned specialist, exemplified by DeepSeek, which blends domain-specific training, proprietary datasets and built-in heuristics. The other is the generalist opportunist: models like Grok that rely on adaptability, fast reactions, and a clear thesis rather than deep market priors.

The next fortnight will decide which approach scales, and which self-destructs under live-market pressure.

Enjoyed this article? Stay informed by joining our newsletter!

Comments

You must be logged in to post a comment.

About Author