Guide · February 2026
Every Developer Contributes: Crowdsourced AI Rankings
When you use Polydev to get AI perspectives, you're automatically contributing to the world's first crowdsourced coding model leaderboard. Here's how it works and how to make the most of it.
You're Already Contributing
If you use get_perspectives in any IDE—Claude Code, Cursor, Windsurf, Cline, or Codex—you're already generating ranking data. No extra setup, no additional tools, no opt-in required.
Every call to get_perspectives does two things: (1) returns AI responses to help you right now, and (2) records anonymous comparisons that improve the leaderboard for everyone.
What's recorded: Only the classification of your prompt (e.g., "Python debugging, moderate complexity") and the comparison outcomes (which model produced the better response). Your actual prompts and the model responses are never stored.
Two Levels of Contribution
Level 1: Passive (Automatic)
When you call get_perspectives, multiple models respond simultaneously. The system automatically compares their outputs using quality heuristics (content length, code blocks, structured formatting) and records which model produced the better response.
# This single call generates ranking data automatically
# No additional code needed — just use Polydev normally
polydev.get_perspectives(
"How should I structure the state management
in this React app?"
)
# Behind the scenes:
# 1. Claude, GPT, Gemini, and Grok all respond
# 2. Prompt classified as: react / architecture / moderate
# 3. C(N,2) pairwise comparisons recorded
# 4. Elo ratings updated for all modelsLevel 2: Active (Explicit Ranking)
After receiving perspectives, your coding agent can explicitly rank the responses from best to worst using rank_perspectives. This carries 1.5x the weight of automatic comparisons because it reflects an informed judgment from the model that actually read and evaluated all the responses.
# After reviewing the perspectives, rank them
polydev.rank_perspectives(
ranked_models=[
"gemini-3-pro", # Best response
"claude-opus-4-6", # Second best
"gpt-5.3-codex", # Third
"grok-4-1-fast-reasoning" # Fourth
],
base_model="claude-opus-4-6", # Self-identify
feedback_text="Gemini provided the most complete
architecture with clear tradeoffs"
)Automatic in Claude Code: When using Claude Code with Polydev, the agent automatically calls rank_perspectives after receiving responses. You don't need to configure anything—the ranking happens as part of the normal workflow.
IDE and Base Model Detection
The leaderboard tracks not just which models are being compared, but also which AI agent is doing the judging. This is important because different base models may have different biases when evaluating responses.
Detection happens through three layers, in priority order:
The AI model passes its own identity via the base_model parameter in rank_perspectives. For example, Claude Opus 4.6 sends claude-opus-4-6. This is the most accurate method.
During the MCP handshake, IDEs send their identity (e.g., claude-code,cursor,windsurf). The stdio-wrapper injects this as ide and ide_version.
If neither of the above is available, the server inspects the HTTP user-agent header for known patterns like claude-code or cursor.
| IDE | MCP Client Name | Inferred Base Model |
|---|---|---|
| Claude Code | claude-code | claude |
| Claude Desktop | Claude Desktop | claude |
| Cursor | cursor | cursor |
| Windsurf | windsurf / codeium | windsurf |
| Cline | cline | cline |
| Codex CLI | codex | codex |
Using the Leaderboard
The leaderboard at /leaderboard is public and requires no login. Here's how to get the most out of it:
Filter for your stack
Use the filter dropdowns to narrow rankings to your specific technology stack. The rankings recalculate in real time based on comparisons that match your filters.
Example: A TypeScript React developer might filter to Language: typescript, Framework: react. The #1 model for this combination might differ from the overall #1.
Compare two models directly
Switch to the "Head-to-Head" tab to see the direct matchup record between any two models. The breakdown by judge method and task category reveals where each model has an advantage.
Read the methodology
Click "How rankings work" at the bottom of the page for details on the Elo system, judge methods, classification approach, and cross-model judging.
Programmatic Access
The leaderboard data is available through a public API. You can use this to build custom dashboards, automated model selection, or integrate rankings into your CI/CD pipeline.
# Fetch overall rankings
curl https://www.polydev.ai/api/leaderboard
# Filter by language and task type
curl "https://www.polydev.ai/api/leaderboard?language=python&task_type=debugging_runtime"
# Head-to-head comparison
curl "https://www.polydev.ai/api/leaderboard/head-to-head?model_a=claude-opus-4-6&model_b=gpt-5.3-codex"All 6 filter dimensions are supported as query parameters: task_type,language,framework,complexity,domain,intent.
Making Rankings Better
The leaderboard improves as more developers use Polydev. Here are the factors that increase ranking quality:
Rankings become more meaningful when they cover a wide variety of languages, frameworks, and task types. Your unique workflow contributes data that no standardized benchmark captures.
Active rankings via rank_perspectives carry 1.5x the weight of automatic comparisons. The more agents that rank responses, the more signal the leaderboard has.
Rankings from different base models (Claude, GPT, Gemini) acting as judges help cancel out individual model biases. The leaderboard tracks which model is judging, so systematic biases can be identified and adjusted.
Models in the Leaderboard
The leaderboard currently tracks these models, with more being added as providers release new versions:
| Model | Provider | Access Method |
|---|---|---|
| Claude Opus 4.6 | Anthropic | Claude Code CLI + API |
| GPT 5.3 Codex | OpenAI | Codex CLI + API |
| Gemini 3 Pro | Gemini CLI + API | |
| Grok 4.1 Fast | xAI | API |
| GLM 4.7 | Cerebras | API |
Getting Started
To start contributing rankings, all you need is Polydev installed in any supported IDE:
# Install Polydev (works with Claude Code, Cursor, Cline, etc.)
npx polydev-ai@latest
# That's it! Every get_perspectives call now contributes
# to the crowdsourced leaderboard automatically.See the installation guide for detailed setup instructions for each IDE.
Start contributing today
Every get_perspectives call makes the leaderboard more accurate. Install Polydev and join the crowdsourced ranking.