How ScryCheck scores Commander decks
Understanding your power level, bracket estimate, and confidence interval
Our goal
The best games of Commander happen when everyone is on a level playing field. That’s what the Rule Zero conversation is for — but it only works when players honestly understand how powerful their decks are.
Brackets are a great starting point, but they only check whether you have specific cards. A deck can stay under the Game Changers limit while packing premium cards in every other slot — and brackets alone won’t catch that.
ScryCheck evaluates every card in your deck across five scoring vectors, identifies combos, detects themes, and shows you exactly how every point is calculated.
The result is an honest power level assessment that gives your pod a common language — so nobody shows up with an unfair edge.
Overview
ScryCheck analyzes Commander decks using a multi-stage pipeline that evaluates card quality, synergy, combo potential, and mana efficiency to produce a power level rating from 1–10.
Every score contribution can be traced back to specific cards and their roles in your deck — nothing is a black box.
How it works
When you submit a deck, it passes through a deterministic multi-stage pipeline. Each stage feeds into the next, building up to a final power level and bracket estimate.
Card parsing
Deck list is parsed, commanders are identified, and card names are normalized against Scryfall data.
Card rating
Each card is evaluated using a layered rating stack: hand-tuned overrides for high-impact cards, LLM-generated category tags for broad coverage, land-type rules, and heuristics derived from oracle text.
Vector scoring
Cards contribute points to five scoring vectors: Speed, Consistency, Interaction, Mana Base, and Threats. Each vector reflects a distinct dimension of deck strength.
Commander synergy
If your commander has a dedicated profile, cards that align with their strategy receive synergy bonuses. Over 100 popular commanders have profiles.
Combo detection
The deck is checked against Commander Spellbook's 80K+ combo variant database. Detected combos are classified by outcome type and card count, then factored into the Threats vector.
Theme & archetype detection
75+ themes (tribal, keyword, and package-based) and 16 archetypes are identified. Your primary archetype influences how vectors are weighted in the final score.
Mana color balance
Card color spread, mana symbol demand, and land production are analyzed to identify fixing gaps and color coverage ratios.
Composition analysis
Mana base quality, card quality distribution, and mana curve efficiency are evaluated. Significant deficiencies in any of these areas cap your maximum power level.
Strategy ceiling
A structural cap is computed from four signals: combo redundancy, commander engine status, commander independence, and infrastructure density. Decks earn headroom above a neutral base — this is what separates a cEDH shell with a theme-first payoff (Bracket 4) from one with a metagame-first payoff (Bracket 5).
Power level calculation
Vector scores, combo weight, card quality, and the strategy ceiling combine into a final power level. A confidence interval reflects how much uncertainty is present in the rating.
Bracket assignment
Hard gates — cards with outsized game impact — set minimum bracket floors independent of power level. Power level can push the bracket higher, but never below the hard gate minimum.
Power level scale
ScryCheck rates decks on a 1–10 scale with decimal precision. Your rating reflects overall deck strength across all five vectors — not just one axis like combos or mana. The ± margin shows the confidence interval: it widens when there’s meaningful uncertainty (many unrecognized cards, weak archetype signal, boundary proximity) and narrows with strong coverage and a clear deck identity.
Top-end competitive Commander: maximal efficiency, compact wins, and robust protection across all axes.
Near-cEDH construction and pacing; viable at cEDH-adjacent tables but not top-tier meta.
Fast, consistent, and resilient; can present and protect strong wins in stronger pods.
High slot efficiency and tight construction; strong execution without the full cEDH profile.
Well-tuned synergy shell with consistent lines and solid fundamentals.
Reliable game plan with intentional card choices; moderate speed and interaction.
Meaningful upgrades and cleaner synergies, though consistency gaps remain.
Comparable to stock precons: clear plan, but slower mana and lower card efficiency.
Core idea exists, but ramp, draw, and interaction still need major refinement.
Little strategic cohesion; mana and card quality are very inconsistent.
Commander brackets
ScryCheck estimates your deck’s Commander Bracket based on the official WotC guidelines . Brackets reflect both what’s in your deck and how powerful it is overall.
How bracket assignment works:
Certain high-impact cards act as hard gates — they set a minimum bracket floor regardless of overall power level. Power level can push your bracket above that floor, but never below it. A deck with no hard gates is bracketed purely by power level.
cEDH shell vs. cEDH strategy
A deck can run fast mana, free interaction, and tutors — a genuine cEDH shell — while its win condition relies on coin flips, tribal combat, or chaos effects. These decks have high card quality but low strategic reliability. ScryCheck distinguishes them using a strategy ceiling:
- • cEDH shell + metagame-first payoff (redundant deterministic wins, commander is a combo piece, wins without the commander) → Bracket 5
- • cEDH shell + theme-first payoff (coin flips, tribal, chaos, voltron) → Bracket 4 ceiling
The ceiling is shown on your deck analysis as the Strategy Ceiling indicator.
Fully optimized competitive decks with redundant, deterministic win lines. Power level 9 or higher.
Many Game Changers, mass land denial, or deterministic 2-card combos. Strong, focused, and often faster than casual tables expect.
A few Game Changers, or any 2-card combo. Tuned and efficient. Most power-7 decks live here.
Upgraded precons and synergy-focused builds. No Game Changers. Clearly more powerful than a stock precon.
Precons and very casual decks. Theme over optimization.
Game Changers
Game Changers are cards identified by WotC as having outsized impact on Commander games. ScryCheck detects all of them and uses their presence to set your bracket floor — the more you have, the higher the minimum.
They fall into four categories: fast mana that dramatically accelerates your development, efficient tutors that find any card with minimal investment, resource engines that generate overwhelming card or mana advantage, and stax and denial pieces that lock opponents out.
The full official Game Changer list is published by WotC and available on the Commander Brackets page
Win speed estimate
Every analysis surfaces a three-number win speed estimate next to the power header — best, typical, and contested. It collapses card quality, synergy, mana efficiency, and combo potential into one practical question: how fast does this deck actually close out games?
The fastest realistic win turn with good draws and no opposing disruption — the upper bound for what the deck is capable of.
The expected win turn in a normal multiplayer game where opponents are playing their own plans. This is the number that maps most naturally to WotC’s bracket turn expectations (B1 9+, B2 8+, B3 6+, B4 4+, B5 any).
The estimated turn when opponents are actively interacting and disrupting your plan.
How it’s calculated
The estimate starts from deck-derived signals — velocity, finisher density, combo compactness, average mana value, interaction load, and the calculated power level. Every contributor is shown in the expandable detail under the win-turn card on your analysis page.
For commanders with enough tracked games, ScryCheck additionally anchors the typical estimate against real gameplay data from Playgroup.gg — a public dataset of roughly 933,000 tracked Commander games across about 3,800 commanders. When the engine prediction differs from the empirical average, the typical turn is pulled toward the observed value. The tooltip on the win-turn card shows the exact number of games the anchor is based on, plus the empirical average.
An honest note about the empirical anchor
Playgroup’s data is collected from casual Commander games. The average win turn for a popular commander reflects how a typical casual build actually plays at a real table — including mulligans, real disruption, and pilot variance — but it skews toward casual pacing rather than highly tuned builds.
To prevent this from misleading the estimate for tuned decks, the empirical anchor is weight-reduced for high-power and very low-power builds. A B5-tier Krenko deck shouldn’t be pulled toward the casual-table average for Krenko, and a janky brew shouldn’t either. Mid-power decks — where the empirical anchor is most informative — receive the full blend.
Coverage caveat: about half of the commanders Playgroup tracks meet the minimum trust threshold (40 games and 10 unique pilots). New, fringe, or recently-printed commanders fall back to engine-only prediction until the dataset catches up. The tooltip says so plainly when that’s the case.
Combo detection
ScryCheck uses the Commander Spellbook database to detect combos — 80,000+ documented variants, from 2-card lines to complex setups. Detected combos contribute to your Threats score based on how compact they are and what they actually do.
Card count
Compact combos are weighted more heavily than long ones. A 2-card combo is harder to disrupt and faster to assemble than a 5-card setup — the scoring reflects that.
Outcome type
Combos are classified by what they do: instant wins, deterministic kills, infinite resource generation, defensive locks, or repeatable value. Game-ending outcomes score higher; lock and value loops still count, but carry less weight because they do not immediately win the game.
Redundancy
Combos that share pieces with other combos in your deck receive a bonus. Overlapping lines are more consistent — drawing one piece gets you closer to multiple wins.
Outcome hints
Each combo row in your results shows an icon indicating what the combo does — mana, turns, damage, locks, wins, and more. Hover the icon for details sourced directly from Commander Spellbook.
To prevent permutation inflation, overlapping combo lines are grouped into families with a cap on their combined contribution to your score.
Input formats
Paste list
One card per line. Quantities are optional. Commander can be in the list or entered separately.
Sol Ring 1 Mana Crypt 4 Island Demonic Tutor
Import URL
Paste a link directly from:
- • Moxfield (moxfield.com/decks/...)
- • Archidekt (archidekt.com/decks/...)
How accuracy is measured
Power level is one of the hardest things to measure in Commander because there’s no objective ground truth — the same deck can feel like a 6 or an 8 depending on who you ask. Here’s how ScryCheck approaches measurement honestly.
Reference deck validation
ScryCheck validates against a set of 252 reference decks with known expected power levels and brackets. Every scoring change is measured against this suite before shipping.
As of April 2026.
202 of 252 reference decks are low-confidence estimates (database-mined), not independently verified. The validation numbers are real, but the ground truth itself carries uncertainty.
What’s next
The remaining accuracy gap is structural: the engine currently evaluates cards mostly in isolation and only partially accounts for how cards interact with each other and the commander. The active synergy-modeling workstream is building that layer — when it ships, bracket-exact accuracy is expected to close meaningfully on the remaining 19%.
Empirical win-turn correlation
The win speed estimate is independently checked against real gameplay data from Playgroup.gg . On every build we measure the Pearson correlation between the engine’s typical-turn prediction and the observed average win turn across all commanders with 100+ tracked games (currently 1,116 commanders).
Current baseline (April 2026):
- • r = 0.30 — Pearson correlation, engine vs. empirical (CI floor)
- • MAE 1.95 turns — mean absolute error before the empirical anchor blends in
- • 55% within ±2 turns, 27% within ±1 turn (engine-only prediction)
The correlation is moderate by design: ScryCheck-submitted decks for popular commanders skew tuned, while Playgroup’s avgWinTurn averages over casual-table populations. The gate’s job is to catch regressions (engine drift or data drift), not to claim the engine is 100% accurate at predicting individual game outcomes — which would not be a credible claim about Commander.
LLM cross-validation
To check whether the engine’s accuracy is self-referential, we ran all 252 reference decks through Claude and GPT-4o independently — neither model saw our scores, ground truth values, or thresholds. They evaluated each deck from scratch using only the card list and WotC’s bracket framework.
What the cross-validation found:
- • The engine outperformed both LLMs significantly — 93% bracket accuracy vs. 48% (Claude) and 38% (GPT-4o). Rule-based scoring was better calibrated than LLM judgment alone for this task.
- • LLMs systematically over-rate decks — they struggle to recognize how casual true Bracket 1–2 decks really are.
- • 6 reference decks were identified as miscalibrated — cases where the engine and both LLMs agreed the expected bracket was wrong.
The headline finding holds: an independent evaluation with no access to our ground truth agreed with the engine more than it disagreed, and pointed to real ground truth errors when it didn’t.
Cross-validation conducted in early 2026 against an earlier engine version. The engine has since been recalibrated to a more conservative baseline (see above). A rerun against the current engine is planned.
Why measurement is hard
Self-assessment bias
Players systematically underestimate their decks — especially near the B3/B4 boundary. ScryCheck measures the deck as constructed, not as the player perceives it.
The ground truth problem
Any validation system needs ground truth to measure against. Ours comes from database-mined estimates, community assessments, and expert review — and the quality of the ground truth limits the quality of the accuracy claim.
Card quality vs. table impact
ScryCheck measures what’s in the deck. It can’t account for piloting skill, local meta, or political dynamics.
The bracket boundary problem
Brackets are discrete (1–5) but power level is continuous. Any deck near a boundary will feel “wrong” to someone. The confidence interval and bracket boundary note in your results communicate this uncertainty explicitly.
Continuous improvement
ScryCheck’s scoring engine is validated automatically on every update. The reference deck suite is reviewed quarterly, and user feedback is aggregated to detect systematic biases.
If you think your deck is significantly miscalibrated, use the feedback button after re-analyzing — it captures your expected bracket and routes directly to our calibration review queue.
FAQ
Why is my deck rated lower than expected?
There are two common reasons:
- • Card quality vs. strategic reliability. ScryCheck separately measures how powerful your cards are and how reliably your strategy can win. A high-power shell with a theme-first win condition (coin flips, tribal, chaos, voltron) will have its power level capped by the strategy ceiling — shown on your analysis page. Adding a second deterministic combo line, for example, raises that ceiling.
- • Piloting and meta factors. ScryCheck measures what’s in the deck. It can’t account for piloting skill, political dynamics, or local meta adjustments. The ± margin shows our confidence range.
What is the Strategy Ceiling indicator?
The Strategy Ceiling appears on decks scoring PL 8+ and shows the structural cap on your power level based on how the deck is built to win, not just how powerful the individual cards are. It starts at a neutral base and earns headroom through four signals:
- • Combo redundancy — multiple independent, deterministic paths to win
- • Commander engine — the commander is part of a compact, deterministic combo
- • Commander independence — wins exist that don’t require the commander on the battlefield
- • Infrastructure density — a high share of generically competitive cards (fast mana, tutors, interaction)
A ceiling of 10.0 means the deck has no structural ceiling — all signals fired. A ceiling of 8.5 means the deck runs a powerful shell but the strategy itself doesn’t reach Bracket 5 territory.
What does “Confidence” mean?
Confidence reflects model certainty, not just card recognition. It combines coverage with signals like archetype clarity, how many cards were categorized by baseline heuristics only, and deck completeness.
What does the ± margin mean?
The ± margin is the confidence interval around your power level. It widens when uncertainty signals are present — for example, boundary proximity, weak archetype confidence, or many unrecognized cards — and narrows when coverage and rating quality are strong.
Why is my mana base score low or negative?
The mana base score reflects land quality relative to format expectations — not mana acceleration (that's Speed). A tapland-heavy mana base scores low and can cap your maximum power level rather than producing a negative score directly.
How are decks stored?
When you analyze a deck, results are saved with a unique hash. Decks with identical card lists share the same URL, making it easy to share your analysis. You can re-analyze at any time to get updated scores.
Why doesn't my commander show synergy bonuses?
Commander synergy detection covers 100+ hand-curated profiles plus 700+ commanders with synergy signals auto-generated from a corpus of analyzed decks. If your commander isn't covered yet, the deck is still analyzed normally — you just won't see commander-specific bonuses. Coverage grows as more decks are analyzed.