The AI Intelligence Leaderboard

Estimating the intelligence of every major AI model

AI Models by IQ
Each model's estimated IQ plotted on a standard normal IQ distribution

How AI IQ estimates model intelligence

  1. We archive source captures from public benchmark leaderboards and extract only source-backed values
  2. We map each benchmark score to an implied IQ using calibrated difficulty curves
  3. We group scored benchmarks into seven dimensions: abstract, mathematical, scientific, app building, production engineering, computer use, and reliability
  4. We conservatively fill missing benchmark and dimension estimates only inside the scoring pipeline
  5. Every derived IQ averages all seven scored dimensions, so missing coverage cannot make a model look better by omission
IQ vs Effective Cost
Each model's estimated IQ plotted against effective cost per 1M I/O Tokens (sticker price × blended usage multiplier).
IQ 1:1 Cost

Effective cost & iso-curves

Effective cost on the X-axis is sticker price for 1M I/O Tokens × token usage multiplier. 1M I/O Tokens means 1M input tokens plus 1M output tokens, priced at the model's published rates.

Iso-curves trace lines of equal preference for IQ versus cost. The slider weights quality vs cost: center is 1:1, drag toward Cost to make cost matter more, or toward IQ to make quality matter more. Models above and to the right of a curve are strictly better.

Frontier IQ Over Time
X = release date. Y = estimated IQ. Provider step-lines connect each provider's flagship frontier checkpoints over time.

Tracking frontier progress

Each dot is a model with a known release date and a derived IQ estimate. Models are positioned left-to-right by release date, so the chart shows how the frontier changes over time rather than just where models rank today.

Provider-colored lines connect each lab's flagship frontier checkpoints. Codex, mini, nano, flash, coder, and smaller open-weight variants are omitted so the chart tracks each lab's main offering rather than every SKU.

This view is most useful for spotting whether a new release is actually ahead of its direct predecessor, or whether source coverage and conservative imputations are shaping the comparison.

Mathematical Reasoning IQ
Each model's Mathematical Reasoning IQ plotted on a standard normal IQ distribution

What it measures

Multi-step quantitative reasoning, from competition problems to research-level proofs.

Scientific Reasoning IQ
Each model's Scientific Reasoning IQ plotted on a standard normal IQ distribution

What it measures

Graduate-level reasoning across the natural sciences and applying scientific knowledge to hard problems.

Abstract Reasoning IQ
Each model's Abstract Reasoning IQ plotted on a standard normal IQ distribution

What it measures

Fluid problem-solving on novel puzzles a model cannot have memorized — abstracting patterns from just a few examples.

App Building IQ
Each model's App Building IQ plotted on a standard normal IQ distribution

What it measures

Turning product and design prompts into usable apps, front-end experiences, and full-stack prototypes.

Production Engineering IQ
Each model's Production Engineering IQ plotted on a standard normal IQ distribution

What it measures

Coding fluency, repository repair, debugging, testing, and long-horizon engineering execution.

Computer Use IQ
Each model's Computer Use IQ plotted on a standard normal IQ distribution

What it measures

Agentic operation of real tools and environments — terminals, browsers, and desktop apps.

Reliability IQ
Each model's Reliability IQ plotted on a standard normal IQ distribution

What it measures

Following instructions precisely and knowing the limits of its own knowledge instead of guessing.

Emotional Reasoning (EQ)
Diagnostic Emotional Reasoning scores, excluded from Composite IQ

What it measures

A diagnostic view of emotional and interpersonal behavior. This is excluded from Composite IQ until the benchmark base becomes more rigorous.

IQ vs Speed vs Cost in 3D
3D scatter: X = response time (log, faster to the right), Y = IQ, Z = effective cost (log). Color = provider. Drag to rotate.

Three tradeoffs at once

Most charts pit two qualities against each other. This view holds all three of the practical tradeoffs in one space: how smart a model is, how fast it answers, and what it costs to run.

IQ rises on the vertical axis, faster models sit to the right, and effective cost runs back into the depth axis on a log scale. The ideal model lives up, right, and toward the front — high intelligence, quick responses, and low cost. Drag to rotate and find where each provider clusters.

IQ Methodology