How smart is your AI model, really?

AI IQ intelligently estimates the IQs of popular AI models

AI Models by IQ
Each model's estimated IQ plotted on a standard normal IQ distribution

How AI IQ estimates model intelligence

  1. We archive source captures from public benchmark leaderboards and extract only source-backed values
  2. We map each benchmark score to an implied IQ using calibrated difficulty curves
  3. We group 17 benchmarks into five reasoning dimensions: fluid abstraction, mathematical, programmatic, critical, and agentic
  4. We conservatively fill missing benchmark and dimension estimates only inside the scoring pipeline
  5. Every derived IQ averages all five dimensions, so missing coverage cannot make a model look better by omission
IQ vs Effective Cost
Each model's estimated IQ plotted against its per-task effective cost (sticker price × usage multiplier)

Effective cost & iso-curves

Effective cost on the X-axis is token cost (cost for 2M input + 1M output tokens) × token usage multiplier (this model's AA token usage ÷ the median). It's what each model spends to do a task that the median model handles with that 2:1 token mix.

Iso-curves trace lines of equal preference. The dropdown picks the Y-axis metric (overall IQ, the five dimension IQs, or an individual benchmark). The 1:1 ratio control weights quality vs cost — at 1:1, one IQ point is worth one halving of cost; click right (1:2, 1:5…) to make cost matter more, left (2:1, 5:1…) to make quality matter more. Models above and to the right of a curve are strictly better.

Frontier IQ Over Time
X = release date. Y = estimated IQ. Provider step-lines connect each provider's flagship frontier checkpoints over time.

Tracking frontier progress

Each dot is a model with a known release date and a derived IQ estimate. Models are positioned left-to-right by release date, so the chart shows how the frontier changes over time rather than just where models rank today.

Provider-colored lines connect each lab's flagship frontier checkpoints. Codex, mini, nano, flash, coder, and smaller open-weight variants are omitted so the chart tracks each lab's main offering rather than every SKU.

This view is most useful for spotting whether a new release is actually ahead of its direct predecessor, or whether source coverage and conservative imputations are shaping the comparison.

AI Models by EQ
Each model's estimated EQ plotted on a standard normal IQ distribution

How AI IQ estimates emotional intelligence

  1. We pull in each model's Arena Elo score, IFBench score, and EQ-Bench 3 Elo score
  2. We map each source score to an estimated EQ using calibrated piecewise-linear scales
  3. EQ-Bench 3 is retained as the dedicated emotional/social reasoning signal, but treated as style-sensitive because it is judged by Claude
  4. Anthropic models receive a modest 100-point Elo adjustment on EQ-Bench before mapping
  5. The composite EQ requires at least two source-backed components, then averages the available Arena, IFBench, and EQ-Bench signals
IQ vs EQ
X = composite EQ. Y = IQ. Color = model provider.

IQ and EQ tradeoffs

IQ summarizes benchmark-based reasoning ability across fluid abstraction, mathematical reasoning, programmatic reasoning, critical reasoning, and agentic reasoning dimensions.

EQ estimates interaction quality from Arena, IFBench, and EQ-Bench 3 signals, then maps those scores onto the same kind of normalized scale so models can be compared directly.

Iso-curves trace lines of equal preference between IQ and EQ. The 1:1 ratio control weights the two — at 1:1, one IQ point is worth one EQ point; click right (1:2, 1:5…) to make EQ matter more, left (2:1, 5:1…) to make IQ matter more. Models above and to the right of a curve are strictly better at that preference.

IQ vs EQ vs Cost in 3D
3D scatter: X = EQ, Y = IQ, Z = effective cost (log). Color = provider. Drag to rotate.

Three dimensions, one view

Most charts on this page reduce model comparison to two axes. This one keeps all three: EQ (X), IQ (Y), and effective cost (Z, log-scaled — the depth axis). Effective cost is sticker price for a 2M-input + 1M-output workload multiplied by the blended usage multiplier.

Drag to rotate the cloud. The dashed line is the central tradeoff axis: it is perpendicular to the isoquant surface at the middle of the cube and points toward higher IQ, higher EQ, and lower effective cost. Models nearer the green end are stronger all-around deals; models nearer the red end give up capability, cost efficiency, or both.

Color = provider, matching the legend below.

IQ Methodology
EQ Methodology