Combination Models

Fusion, mixture-of-agents, and router-orchestrated systems that behave like one model surface

Tracked separately

Model-like systems, not ordinary checkpoints

Combination models route, fan out, debate, verify, or synthesize across multiple model calls while exposing a single endpoint, preset, or product identity. They belong on a separate page until their costs, latency, benchmark coverage, and component recipes are source-backed enough to compare directly with single models in the main leaderboard.

Page rule Include systems only when the public source describes a multi-model or multi-agent recipe presented as one model-like surface.
System Surface Combination Pattern Public Evidence AI IQ Treatment
OpenRouter Fusion API model/tool/plugin/chatroom Parallel panel plus judge/synthesizer DRACO deep-research scores for named panels Track here; do not derive Composite IQ from one benchmark family
Sakana Fugu OpenAI-compatible model API Learned orchestration over a model pool Multi-benchmark table for Fugu and Fugu Ultra Candidate for future sparse benchmark rows if source definitions align
Devin Fusion Devin harness behavior Main agent plus sidekick delegation Cost/quality examples and average cost-reduction claims Track as product-system evidence, not a standalone model row
Hermes MoA Virtual model provider / preset Reference models feeding an aggregator Implementation documentation, no comparable scorecard List without ranking until source-backed evals exist
vLLM VSR Router-level model alias Task-shaped router recipes and micro-agents Scorecard rows for VSR Closed and VSR Hybrid Track here; benchmark fields need recipe/source disambiguation first

Why this is not in the main model table yet

AI IQ's main model rankings assume a model row represents a reasonably stable model identity with source-backed benchmark fields and documented pricing. Combination systems can change recipes, fan-out width, synthesizer choice, and routing policy without changing the outward name. This page keeps the category visible while avoiding accidental apples-to-oranges IQ scoring.

Future promotion into the dataset should require a stable public model identifier, published pricing or hosted cost basis, source-backed benchmark rows that map to existing fields, and clear notes about whether benchmark numbers include tools, web access, multi-turn orchestration, or hidden routing.