How smart is your AI model, really?
AI IQ intelligently estimates the IQs of popular AI models
How AI IQ estimates model intelligence
- We archive source captures from public benchmark leaderboards and extract only source-backed values
- We map each benchmark score to an implied IQ using calibrated difficulty curves
- We group 18 benchmarks into five reasoning dimensions: fluid abstraction, mathematical, programmatic, critical, and agentic
- We conservatively fill missing benchmark and dimension estimates only inside the scoring pipeline
- Every derived IQ averages all five dimensions, so missing coverage cannot make a model look better by omission
Effective cost & iso-curves
Effective cost on the X-axis is token cost (cost for 2M input + 1M output tokens) × token usage multiplier (this model's AA token usage ÷ the median). It's what each model spends to do a task that the median model handles with that 2:1 token mix.
Iso-curves trace lines of equal preference for IQ versus cost. The slider weights quality vs cost: center is 1:1, drag toward Cost to make cost matter more, or toward IQ to make quality matter more. Models above and to the right of a curve are strictly better.
Tracking frontier progress
Each dot is a model with a known release date and a derived IQ estimate. Models are positioned left-to-right by release date, so the chart shows how the frontier changes over time rather than just where models rank today.
Provider-colored lines connect each lab's flagship frontier checkpoints. Codex, mini, nano, flash, coder, and smaller open-weight variants are omitted so the chart tracks each lab's main offering rather than every SKU.
This view is most useful for spotting whether a new release is actually ahead of its direct predecessor, or whether source coverage and conservative imputations are shaping the comparison.
How AI IQ estimates emotional intelligence
- We pull in each model's Text Arena Elo score and EQ-Bench 3 Elo score
- We map each source score to an estimated EQ using calibrated piecewise-linear scales
- EQ-Bench 3 is retained as the dedicated emotional/social reasoning signal, but treated as style-sensitive because it is judged by Claude
- Anthropic models receive a 300-point Elo adjustment on EQ-Bench before mapping
- The composite EQ requires both source-backed components, then averages the available Text Arena and EQ-Bench signals
IQ and EQ tradeoffs
IQ summarizes benchmark-based reasoning ability across fluid abstraction, mathematical reasoning, programmatic reasoning, critical reasoning, and agentic reasoning dimensions.
EQ estimates interaction quality from Text Arena and EQ-Bench 3 signals, then maps those scores onto the same kind of normalized scale so models can be compared directly.
Iso-curves trace lines of equal preference between IQ and EQ. The slider weights the two: center is 1:1, drag toward EQ to make EQ matter more, or toward IQ to make IQ matter more. Models above and to the right of a curve are strictly better at that preference.
Three dimensions, one view
Most charts on this page reduce model comparison to two axes. This one keeps all three: EQ (X), IQ (Y), and effective cost (Z, log-scaled — the depth axis). Effective cost is sticker price for a 2M-input + 1M-output workload multiplied by the blended usage multiplier.
Drag to rotate the cloud. The dashed line is the central tradeoff axis: it is perpendicular to the isoquant surface at the middle of the cube and points toward higher IQ, higher EQ, and lower effective cost. Models nearer the green end are stronger all-around deals; models nearer the red end give up capability, cost efficiency, or both.
Color = provider, matching the legend below.