Why is my Commander deck rated lower or higher than expected?

Power level is subjective. ScryCheck focuses on card quality, combo potential, and mana efficiency. A deck can feel powerful due to piloting skill or local meta, even if the raw card analysis suggests otherwise. The ± margin shows our confidence range.

What does Confidence mean in ScryCheck?

Confidence reflects model certainty, not just card coverage. It combines rating coverage with factors like archetype clarity, baseline-only card ratio, low-vector cap-binding risk, and whether the imported list is complete.

What does the ± margin on my power level mean?

The ± margin is a confidence interval around your power level. It widens when uncertainty is higher (for example, many baseline-only cards, weak archetype signal, multiple low vectors, boundary proximity, or incomplete deck imports) and narrows when coverage and rating quality are strong.

Why is my mana base score negative?

Mana base score reflects land quality and fixing efficiency relative to format expectations, not raw ramp speed. Taplands are low-value rather than negative by themselves, but heavy tapland counts can still cap your overall power ceiling.

How ScryCheck scores Commander decks

Understanding your power level, bracket estimate, and confidence interval

Our goal

The best games of Commander happen when everyone is on a level playing field. That’s what the Rule Zero conversation is for — but it only works when players honestly understand how powerful their decks are.

Brackets are a great starting point, but they only check whether you have specific cards. A deck can stay under the Game Changers limit while packing premium cards in every other slot — and brackets alone won’t catch that.

ScryCheck evaluates every card in your deck across five scoring vectors, identifies combos, detects themes, and shows you exactly how every point is calculated.

The result is an honest power level assessmentthat gives your pod a common language — so nobody shows up with an unfair edge.

Overview

ScryCheck analyzes Commander decks using a multi-stage pipeline that evaluates card quality, synergy, combo potential, and mana efficiency to produce a power level rating from 1–10.

Every score contribution can be traced back to specific cards and their roles in your deck — nothing is a black box.

Scoring vectors

30K+

Categorized cards

80K+

Known combos

75+

Detectable themes

How it works

When you submit a deck, it passes through a deterministic multi-stage pipeline. Each stage feeds into the next, building up to a final power level and bracket estimate.

Card parsing

Deck list is parsed, commanders are identified, and card names are normalized against Scryfall data.

Card rating

Each card is evaluated using a layered rating stack: hand-tuned overrides for high-impact cards, LLM-generated category tags for broad coverage, land-type rules, and heuristics derived from oracle text.

Vector scoring

Cards contribute points to five scoring vectors: Speed, Consistency, Interaction, Mana Base, and Threats. Each vector reflects a distinct dimension of deck strength.

Commander synergy

If your commander has a dedicated profile, cards that align with their strategy receive synergy bonuses. Over 100 popular commanders have profiles.

Combo detection

The deck is checked against Commander Spellbook's 80K+ combo variant database. Detected combos are classified by outcome type and card count, then factored into the Threats vector.

Theme & archetype detection

75+ themes (tribal, keyword, and package-based) and 16 archetypes are identified. Your primary archetype influences how vectors are weighted in the final score.

Mana color balance

Card color spread, mana symbol demand, and land production are analyzed to identify fixing gaps and color coverage ratios.

Composition analysis

Mana base quality, card quality distribution, and mana curve efficiency are evaluated. Significant deficiencies in any of these areas cap your maximum power level.

Strategy ceiling

A structural cap is computed from four signals: combo redundancy, commander engine status, commander independence, and infrastructure density. Decks earn headroom above a neutral base — this is what separates a cEDH shell with a theme-first payoff (Bracket 4) from one with a metagame-first payoff (Bracket 5).

Power level calculation

Vector scores, combo weight, card quality, and the strategy ceiling combine into a final power level. A confidence interval reflects how much uncertainty is present in the rating.

Bracket assignment

Hard gates — cards with outsized game impact — set minimum bracket floors independent of power level. Power level can push the bracket higher, but never below the hard gate minimum.

Power level scale

ScryCheck rates decks on a 1–10 scale with decimal precision. Your rating reflects overall deck strength across all five vectors — not just one axis like combos or mana. The ± marginshows the confidence interval: it widens when there’s meaningful uncertainty (many unrecognized cards, weak archetype signal, boundary proximity) and narrows with strong coverage and a clear deck identity.

cEDH

Top-end competitive Commander: maximal efficiency, compact wins, and robust protection across all axes.

Fringe cEDH

Near-cEDH construction and pacing; viable at cEDH-adjacent tables but not top-tier meta.

High power

Fast, consistent, and resilient; can present and protect strong wins in stronger pods.

Optimized

High slot efficiency and tight construction; strong execution without the full cEDH profile.

Tuned casual

Well-tuned synergy shell with consistent lines and solid fundamentals.

Focused casual

Reliable game plan with intentional card choices; moderate speed and interaction.

Upgraded precon

Meaningful upgrades and cleaner synergies, though consistency gaps remain.

Precon level

Comparable to stock precons: clear plan, but slower mana and lower card efficiency.

Early build

Core idea exists, but ramp, draw, and interaction still need major refinement.

Jank / unfocused

Little strategic cohesion; mana and card quality are very inconsistent.

Commander brackets

ScryCheck estimates your deck’s Commander Bracket based on the official WotC guidelines . Brackets reflect both what’s in your deck and how powerful it is overall.

How bracket assignment works:

Certain high-impact cards act as hard gates — they set a minimum bracket floor regardless of overall power level. Power level can push your bracket above that floor, but never below it. A deck with no hard gates is bracketed purely by power level.

cEDH shell vs. cEDH strategy

A deck can run fast mana, free interaction, and tutors — a genuine cEDH shell — while its win condition relies on coin flips, tribal combat, or chaos effects. These decks have high card quality but low strategic reliability. ScryCheck distinguishes them using a strategy ceiling:

• cEDH shell + metagame-first payoff (redundant deterministic wins, commander is a combo piece, wins without the commander) → Bracket 5
• cEDH shell + theme-first payoff (coin flips, tribal, chaos, voltron) → Bracket 4 ceiling

The ceiling is shown on your deck analysis as the Strategy Ceiling indicator.

Bracket 5 — cEDH

Fully optimized competitive decks with redundant, deterministic win lines. Power level 9 or higher.

Bracket 4 — High power

Many Game Changers, mass land denial, or deterministic 2-card combos. Strong, focused, and often faster than casual tables expect.

Bracket 3 — Enhanced

A few Game Changers, or any 2-card combo. Tuned and efficient. Most power-7 decks live here.

Bracket 2 — Upgraded

Upgraded precons and synergy-focused builds. No Game Changers. Clearly more powerful than a stock precon.

Bracket 1 — Casual

Precons and very casual decks. Theme over optimization.

Game Changers

Game Changers are cards identified by WotC as having outsized impact on Commander games. ScryCheck detects all of them and uses their presence to set your bracket floor — the more you have, the higher the minimum.

They fall into four categories: fast mana that dramatically accelerates your development, efficient tutors that find any card with minimal investment, resource engines that generate overwhelming card or mana advantage, and stax and denial pieces that lock opponents out.

The full official Game Changers list is published by WotC and available on the Commander Brackets page

Win speed estimate

Every analysis surfaces a three-number win speed estimate next to the power header — best, typical, and contested. It collapses card quality, synergy, mana efficiency, and combo potential into one practical question: how fast does this deck actually close out games?

Best

The fastest realistic win turn with good draws and no opposing disruption — the upper bound for what the deck is capable of.

Typical

The expected win turn in a normal multiplayer game where opponents are playing their own plans. This is the number that maps most naturally to WotC’s bracket turn expectations (B1 9+, B2 8+, B3 6+, B4 4+, B5 any).

Contested

The estimated turn when opponents are actively interacting and disrupting your plan.

How it’s calculated

The estimate starts from deck-derived signals — velocity, finisher density, combo compactness, average mana value, interaction load, and the calculated power level. Every contributor is shown in the expandable detail under the win-turn card on your analysis page.

For commanders with enough tracked games, ScryCheck additionally anchors the typical estimate against real gameplay data from Playgroup.gg — a public dataset of roughly 933,000 tracked Commander games across about 3,800 commanders. When the engine prediction differs from the empirical average, the typical turn is pulled toward the observed value. The tooltip on the win-turn card shows the exact number of games the anchor is based on, plus the empirical average.

An honest note about the empirical anchor

Playgroup’s data is collected from casual Commander games. The average win turn for a popular commander reflects how a typical casual build actually plays at a real table — including mulligans, real disruption, and pilot variance — but it skews toward casual pacing rather than highly tuned builds.

To prevent this from misleading the estimate for tuned decks, the empirical anchor is weight-reduced for high-power and very low-power builds. A B5-tier Krenko deck shouldn’t be pulled toward the casual-table average for Krenko, and a janky brew shouldn’t either. Mid-power decks — where the empirical anchor is most informative — receive the full blend.

Coverage caveat: about half of the commanders Playgroup tracks meet the minimum trust threshold (40 games and 10 unique pilots). New, fringe, or recently-printed commanders fall back to engine-only prediction until the dataset catches up. The tooltip says so plainly when that’s the case.

Combo detection

ScryCheck uses the Commander Spellbook database to detect combos — 80,000+ documented variants, from 2-card lines to complex setups. Detected combos contribute to your Threats score based on how compact they are and what they actually do.

Card count

Compact combos are weighted more heavily than long ones. A 2-card combo is harder to disrupt and faster to assemble than a 5-card setup — the scoring reflects that.

Outcome type

Combos are classified by what they do: instant wins, deterministic kills, infinite resource generation, defensive locks, or repeatable value. Game-ending outcomes score higher; lock and value loops still count, but carry less weight because they do not immediately win the game.

Redundancy

Combos that share pieces with other combos in your deck receive a bonus. Overlapping lines are more consistent — drawing one piece gets you closer to multiple wins.

Outcome hints

Each combo row in your results shows an icon indicating what the combo does — mana, turns, damage, locks, wins, and more. Hover the icon for details sourced directly from Commander Spellbook.

To prevent permutation inflation, overlapping combo lines are grouped into families with a cap on their combined contribution to your score.

Input formats

Paste list

One card per line. Quantities are optional. Commander can be in the list or entered separately.

Sol Ring
1 Mana Crypt
4 Island
Demonic Tutor

Import URL

Paste a link directly from:

• Moxfield (moxfield.com/decks/...)
• Archidekt (archidekt.com/decks/...)

How accuracy is measured

Power level is one of the hardest things to measure in Commander because there’s no objective ground truth — the same deck can feel like a 6 or an 8 depending on who you ask. Here’s how ScryCheck approaches measurement honestly.

Reference deck validation

ScryCheck validates against a set of 252 reference decks with known expected power levels and brackets. Every scoring change is measured against this suite before shipping.

80%

Bracket exact match

72%

PL within ±0.5

0.41

Mean power level error

As of April 2026.

202 of 252 reference decks are low-confidence estimates (database-mined), not independently verified. The validation numbers are real, but the ground truth itself carries uncertainty.

What’s next

The remaining accuracy gap is structural: the engine currently evaluates cards mostly in isolation and only partially accounts for how cards interact with each other and the commander. The active synergy-modeling workstream is building that layer — when it ships, bracket-exact accuracy is expected to close meaningfully on the remaining 19%.

Empirical win-turn correlation

The win speed estimate is independently checked against real gameplay data from Playgroup.gg . On every build we measure the Pearson correlation between the engine’s typical-turn prediction and the observed average win turn across all commanders with 100+ tracked games (currently 1,116 commanders).

Current baseline (April 2026):

• r = 0.30 — Pearson correlation, engine vs. empirical (CI floor)
• MAE 1.95 turns — mean absolute error before the empirical anchor blends in
• 55% within ±2 turns, 27% within ±1 turn (engine-only prediction)

The correlation is moderate by design: ScryCheck-submitted decks for popular commanders skew tuned, while Playgroup’s avgWinTurn averages over casual-table populations. The gate’s job is to catch regressions (engine drift or data drift), not to claim the engine is 100% accurate at predicting individual game outcomes — which would not be a credible claim about Commander.

LLM cross-validation

To check whether the engine’s accuracy is self-referential, we ran all 252 reference decks through Claude and GPT-4o independently — neither model saw our scores, ground truth values, or thresholds. They evaluated each deck from scratch using only the card list and WotC’s bracket framework.

What the cross-validation found:

• The engine outperformed both LLMs significantly — 93% bracket accuracy vs. 48% (Claude) and 38% (GPT-4o). Rule-based scoring was better calibrated than LLM judgment alone for this task.
• LLMs systematically over-rate decks — they struggle to recognize how casual true Bracket 1–2 decks really are.
• 6 reference decks were identified as miscalibrated — cases where the engine and both LLMs agreed the expected bracket was wrong.

The headline finding holds: an independent evaluation with no access to our ground truth agreed with the engine more than it disagreed, and pointed to real ground truth errors when it didn’t.

Cross-validation conducted in early 2026 against an earlier engine version. The engine has since been recalibrated to a more conservative baseline (see above). A rerun against the current engine is planned.

Why measurement is hard

Self-assessment bias

Players systematically underestimate their decks — especially near the B3/B4 boundary. ScryCheck measures the deck as constructed, not as the player perceives it.

The ground truth problem

Any validation system needs ground truth to measure against. Ours comes from database-mined estimates, community assessments, and expert review — and the quality of the ground truth limits the quality of the accuracy claim.

Card quality vs. table impact

ScryCheck measures what’s in the deck. It can’t account for piloting skill, local meta, or political dynamics.

The bracket boundary problem

Brackets are discrete (1–5) but power level is continuous. Any deck near a boundary will feel “wrong” to someone. The confidence interval and bracket boundary note in your results communicate this uncertainty explicitly.

Continuous improvement

ScryCheck’s scoring engine is validated automatically on every update. The reference deck suite is reviewed quarterly, and user feedback is aggregated to detect systematic biases.

If you think your deck is significantly miscalibrated, use the feedback button after re-analyzing — it captures your expected bracket and routes directly to our calibration review queue.

FAQ

Why is my deck rated lower than expected?

There are two common reasons:

• Card quality vs. strategic reliability. ScryCheck separately measures how powerful your cards are and how reliably your strategy can win. A high-power shell with a theme-first win condition (coin flips, tribal, chaos, voltron) will have its power level capped by the strategy ceiling — shown on your analysis page. Adding a second deterministic combo line, for example, raises that ceiling.
• Piloting and meta factors. ScryCheck measures what’s in the deck. It can’t account for piloting skill, political dynamics, or local meta adjustments. The ± margin shows our confidence range.

What is the Strategy Ceiling indicator?

The Strategy Ceiling appears on decks scoring PL 8+ and shows the structural cap on your power level based on how the deck is built to win, not just how powerful the individual cards are. It starts at a neutral base and earns headroom through four signals:

• Combo redundancy — multiple independent, deterministic paths to win
• Commander engine — the commander is part of a compact, deterministic combo
• Commander independence — wins exist that don’t require the commander on the battlefield
• Infrastructure density — a high share of generically competitive cards (fast mana, tutors, interaction)

A ceiling of 10.0 means the deck has no structural ceiling — all signals fired. A ceiling of 8.5 means the deck runs a powerful shell but the strategy itself doesn’t reach Bracket 5 territory.

What does “Confidence” mean?

Confidence reflects model certainty, not just card recognition. It combines coverage with signals like archetype clarity, how many cards were categorized by baseline heuristics only, and deck completeness.

What does the ± margin mean?

The ± margin is the confidence interval around your power level. It widens when uncertainty signals are present — for example, boundary proximity, weak archetype confidence, or many unrecognized cards — and narrows when coverage and rating quality are strong.

Why is my mana base score low or negative?

The mana base score reflects land quality relative to format expectations — not mana acceleration (that's Speed). A tapland-heavy mana base scores low and can cap your maximum power level rather than producing a negative score directly.

How are decks stored?

When you analyze a deck, results are saved with a unique hash. Decks with identical card lists share the same URL, making it easy to share your analysis. You can re-analyze at any time to get updated scores.

Why doesn't my commander show synergy bonuses?

Commander synergy detection covers 100+ hand-curated profiles plus 700+ commanders with synergy signals auto-generated from a corpus of analyzed decks. If your commander isn't covered yet, the deck is still analyzed normally — you just won't see commander-specific bonuses. Coverage grows as more decks are analyzed.

Analyze your deck →