LLM Leaderboard

Comprehensive benchmark scores for top Large Language Models. Compare performance across Coding, Reasoning, and Creative tasks.

ModelContextPlatform PriceOfficial Price
Code Arena
Chat Arena
GPQA
AIME 2025
SWE-Bench
ARC-AGI v2
1
1.0M
$4.00 / 1M Tokens$20.00 / 1M Tokens
$5.00 / 1M Tokens $25.00 / 1M Tokens
2,003
1,476
91.3%
99.8%
80.8%
68.8%
2
1.0M
$2.00 / 1M Tokens$10.00 / 1M Tokens
$2.00 / 1M Tokens $12.00 / 1M Tokens
1,859
1,222
94.3%
-
80.6%
77.1%
3
GLM-5detail
Zhipu AI
200K
$0.95 / 1M Tokens$2.85 / 1M Tokens
$1.00 / 1M Tokens $3.20 / 1M Tokens
1,594
1,179
-
-
77.8%
-
4
200K
$4.00 / 1M Tokens$20.00 / 1M Tokens
$5.00 / 1M Tokens $25.00 / 1M Tokens
1,580
1,345
87.0%
-
80.9%
37.6%
5
-
$2.00 / 1M Tokens$10.00 / 1M Tokens
$2.00 / 1M Tokens $12.00 / 1M Tokens
1,579
1,045
91.9%
100.0%
76.2%
31.1%
6
1.0M
$0.40 / 1M Tokens$2.50 / 1M Tokens
$0.50 / 1M Tokens $3.00 / 1M Tokens
1,578
1,172
90.4%
99.7%
78.0%
33.6%
7
GPT-5.2
openai
400K-
$1.75 / 1M Tokens $14.00 / 1M Tokens
1,505
1,172
92.4%
100.0%
80.0%
52.9%
8
Kimi K2.5detail
Moonshot AI
262K
$0.50 / 1M Tokens$2.80 / 1M Tokens
$0.60 / 1M Tokens $3.00 / 1M Tokens
1,448
988
87.6%
96.1%
76.8%
-
9
1.0M
$2.00 / 1M Tokens$10.00 / 1M Tokens
$2.50 / 1M Tokens $15.00 / 1M Tokens
1,437
1,146
92.8%
-
-
73.3%
10
200K
$2.40 / 1M Tokens$12.00 / 1M Tokens
$3.00 / 1M Tokens $15.00 / 1M Tokens
1,380
941
89.9%
-
79.6%
58.3%
11
GPT-5 High
openai
400K-
$1.25 / 1M Tokens $10.00 / 1M Tokens
1,301
1,037
87.3%
94.6%
-
-
12
Qwen3.5-397B-A17B
Alibaba Cloud / Qwen Team
262K-
$0.60 / 1M Tokens $3.60 / 1M Tokens
1,214
1,067
88.4%
-
76.4%
-
13
131K
$0.45 / 1M Tokens$1.80 / 1M Tokens
$0.55 / 1M Tokens $2.19 / 1M Tokens
1,143
1,079
81.0%
93.9%
68.0%
-
14
400K
$1.20 / 1M Tokens$9.60 / 1M Tokens
$1.75 / 1M Tokens $14.00 / 1M Tokens
1,139
802
-
-
-
-
15
GPT-5.1 High
OpenAI
400K-
$1.25 / 1M Tokens $10.00 / 1M Tokens
1,117
1,132
88.1%
99.6%
-
-
16
200K
$2.40 / 1M Tokens$12.00 / 1M Tokens
$3.00 / 1M Tokens $15.00 / 1M Tokens
1,103
1,294
83.4%
87.0%
-
-
17
GPT-5 Medium
openai
400K-
$1.25 / 1M Tokens $10.00 / 1M Tokens
1,098
1,026
88.1%
88.9%
-
-
18
400K
$1.20 / 1M Tokens$9.60 / 1M Tokens
$1.75 / 1M Tokens $14.00 / 1M Tokens
1,089
650
-
-
-
-
19
GPT-5.1
OpenAI
400K-
$1.25 / 1M Tokens $10.00 / 1M Tokens
1,079
1,010
88.1%
94.0%
76.3%
-
20
205K
$0.60 / 1M Tokens$2.20 / 1M Tokens
$0.60 / 1M Tokens $2.20 / 1M Tokens
1,030
1,017
85.7%
95.7%
73.8%
-
Showing 1 to 20 of 275 models

Metric Definitions

LLM

Code Arena
Average score across coding arenas based on human votes.
Chat Arena
Human preference score from blind comparisons.
GPQA
Graduate-level science questions requiring expert knowledge.
AIME 2025
Recent math competition problems.
SWE-Bench
Real GitHub issues requiring code changes.
ARC-AGI v2
Abstract reasoning problems.

Image

IMAGE GEN
Human preference score for text-to-image generation.
IMAGE EDIT
Human preference score for image editing and transformation.

Video

Text to Video
Human preference score for text-to-video generation.
Image to Video
Human preference score for image-to-video generation.
Video to Video
Human preference score for video editing capabilities.

TTS

TTS
Human preference score for text-to-speech quality.

STT

STT
Human preference score for transcription accuracy.