Back to Articles

Guide · February 2026

Every Developer Contributes: Crowdsourced AI Rankings

When you use Polydev to get AI perspectives, you're automatically contributing to the world's first crowdsourced coding model leaderboard. Here's how it works and how to make the most of it.

You're Already Contributing

If you use get_perspectives in any IDE—Claude Code, Cursor, Windsurf, Cline, or Codex—you're already generating ranking data. No extra setup, no additional tools, no opt-in required.

Every call to get_perspectives does two things: (1) returns AI responses to help you right now, and (2) records anonymous comparisons that improve the leaderboard for everyone.

What's recorded: Only the classification of your prompt (e.g., "Python debugging, moderate complexity") and the comparison outcomes (which model produced the better response). Your actual prompts and the model responses are never stored.

Two Levels of Contribution

Level 1: Passive (Automatic)

When you call get_perspectives, multiple models respond simultaneously. The system automatically compares their outputs using quality heuristics (content length, code blocks, structured formatting) and records which model produced the better response.

# This single call generates ranking data automatically
# No additional code needed — just use Polydev normally

polydev.get_perspectives(
  "How should I structure the state management
   in this React app?"
)

# Behind the scenes:
# 1. Claude, GPT, Gemini, and Grok all respond
# 2. Prompt classified as: react / architecture / moderate
# 3. C(N,2) pairwise comparisons recorded
# 4. Elo ratings updated for all models

Level 2: Active (Explicit Ranking)

After receiving perspectives, your coding agent can explicitly rank the responses from best to worst using rank_perspectives. This carries 1.5x the weight of automatic comparisons because it reflects an informed judgment from the model that actually read and evaluated all the responses.

# After reviewing the perspectives, rank them
polydev.rank_perspectives(
  ranked_models=[
    "gemini-3-pro",        # Best response
    "claude-opus-4-6",     # Second best
    "gpt-5.3-codex",       # Third
    "grok-4-1-fast-reasoning"  # Fourth
  ],
  base_model="claude-opus-4-6",  # Self-identify
  feedback_text="Gemini provided the most complete
    architecture with clear tradeoffs"
)

Automatic in Claude Code: When using Claude Code with Polydev, the agent automatically calls rank_perspectives after receiving responses. You don't need to configure anything—the ranking happens as part of the normal workflow.

IDE and Base Model Detection

The leaderboard tracks not just which models are being compared, but also which AI agent is doing the judging. This is important because different base models may have different biases when evaluating responses.

Detection happens through three layers, in priority order:

1
Self-identification

The AI model passes its own identity via the base_model parameter in rank_perspectives. For example, Claude Opus 4.6 sends claude-opus-4-6. This is the most accurate method.

2
MCP client info

During the MCP handshake, IDEs send their identity (e.g., claude-code,cursor,windsurf). The stdio-wrapper injects this as ide and ide_version.

3
User-agent fallback

If neither of the above is available, the server inspects the HTTP user-agent header for known patterns like claude-code or cursor.

IDEMCP Client NameInferred Base Model
Claude Codeclaude-codeclaude
Claude DesktopClaude Desktopclaude
Cursorcursorcursor
Windsurfwindsurf / codeiumwindsurf
Clineclinecline
Codex CLIcodexcodex

Using the Leaderboard

The leaderboard at /leaderboard is public and requires no login. Here's how to get the most out of it:

Filter for your stack

Use the filter dropdowns to narrow rankings to your specific technology stack. The rankings recalculate in real time based on comparisons that match your filters.

Example: A TypeScript React developer might filter to Language: typescript, Framework: react. The #1 model for this combination might differ from the overall #1.

Compare two models directly

Switch to the "Head-to-Head" tab to see the direct matchup record between any two models. The breakdown by judge method and task category reveals where each model has an advantage.

Read the methodology

Click "How rankings work" at the bottom of the page for details on the Elo system, judge methods, classification approach, and cross-model judging.

Programmatic Access

The leaderboard data is available through a public API. You can use this to build custom dashboards, automated model selection, or integrate rankings into your CI/CD pipeline.

# Fetch overall rankings
curl https://www.polydev.ai/api/leaderboard

# Filter by language and task type
curl "https://www.polydev.ai/api/leaderboard?language=python&task_type=debugging_runtime"

# Head-to-head comparison
curl "https://www.polydev.ai/api/leaderboard/head-to-head?model_a=claude-opus-4-6&model_b=gpt-5.3-codex"

All 6 filter dimensions are supported as query parameters: task_type,language,framework,complexity,domain,intent.

Making Rankings Better

The leaderboard improves as more developers use Polydev. Here are the factors that increase ranking quality:

More diverse prompts

Rankings become more meaningful when they cover a wide variety of languages, frameworks, and task types. Your unique workflow contributes data that no standardized benchmark captures.

More explicit rankings

Active rankings via rank_perspectives carry 1.5x the weight of automatic comparisons. The more agents that rank responses, the more signal the leaderboard has.

More IDEs and models

Rankings from different base models (Claude, GPT, Gemini) acting as judges help cancel out individual model biases. The leaderboard tracks which model is judging, so systematic biases can be identified and adjusted.

Models in the Leaderboard

The leaderboard currently tracks these models, with more being added as providers release new versions:

ModelProviderAccess Method
Claude Opus 4.6AnthropicClaude Code CLI + API
GPT 5.3 CodexOpenAICodex CLI + API
Gemini 3 ProGoogleGemini CLI + API
Grok 4.1 FastxAIAPI
GLM 4.7CerebrasAPI

Getting Started

To start contributing rankings, all you need is Polydev installed in any supported IDE:

# Install Polydev (works with Claude Code, Cursor, Cline, etc.)
npx polydev-ai@latest

# That's it! Every get_perspectives call now contributes
# to the crowdsourced leaderboard automatically.

See the installation guide for detailed setup instructions for each IDE.

Start contributing today

Every get_perspectives call makes the leaderboard more accurate. Install Polydev and join the crowdsourced ranking.