BullshitBench v2 Scores

Clear Pushback rate: share of attempts where a model clearly challenges a false premise instead of accepting nonsense. Color = provider.

BullshitBench v2 Scores

Clear Pushback rate: share of attempts where a model clearly challenges a false premise instead of accepting nonsense. Color = provider.

How To Read This Chart

This benchmark chart uses source-backed benchmark rows mapped to public AI IQ model profiles.

Top Models

RankModelProviderScore
1opus-4.8Anthropic94
2sonnet-4.6Anthropic91
3opus-4.5Anthropic90
4opus-4.6Anthropic87
5qwen3.5-397bAlibaba78
6haiku-4.5Anthropic77
7opus-4.7Anthropic74
8minimax-m3MiniMax63
9mimo-v2.5-proXiaomi62
10qwen3.6-plusAlibaba59
11fable-5Anthropic56
12qwen3.7-maxAlibaba56

Related Charts