How ScryCheck scores Commander decks
Understanding your power level, bracket estimate, and confidence interval
Overview
ScryCheck analyzes Commander decks using a multi-stage pipeline that evaluates card quality, synergy, combo potential, and mana efficiency to produce a power level rating from 1–10.
Every score contribution can be traced back to specific cards and their roles in your deck — nothing is a black box.
How it works
When you submit a deck, it passes through a deterministic multi-stage pipeline. Each stage feeds into the next, building up to a final power level and bracket estimate.
Card parsing
Deck list is parsed, commanders are identified, and card names are normalized against Scryfall data.
Card rating
Each card is evaluated using a layered rating stack: hand-tuned overrides for high-impact cards, LLM-generated category tags for broad coverage, land-type rules, and heuristics derived from oracle text.
Vector scoring
Cards contribute points to five scoring vectors: Speed, Consistency, Interaction, Mana Base, and Threats. Each vector reflects a distinct dimension of deck strength.
Commander synergy
If your commander has a dedicated profile, cards that align with their strategy receive synergy bonuses. Over 100 popular commanders have profiles.
Combo detection
The deck is checked against Commander Spellbook's 80K+ combo variant database. Detected combos are classified by outcome type and card count, then factored into the Threats vector.
Theme & archetype detection
75+ themes (tribal, keyword, and package-based) and 16 archetypes are identified. Your primary archetype influences how vectors are weighted in the final score.
Mana color balance
Card color spread, mana symbol demand, and land production are analyzed to identify fixing gaps and color coverage ratios.
Composition analysis
Mana base quality, card quality distribution, and mana curve efficiency are evaluated. Significant deficiencies in any of these areas cap your maximum power level.
Power level calculation
Vector scores, combo weight, and card quality combine into a base score. Caps are applied based on deck completeness across all axes. A confidence interval reflects how much uncertainty is present in the rating.
Bracket assignment
Hard gates — cards with outsized game impact — set minimum bracket floors independent of power level. Power level can push the bracket higher, but never below the hard gate minimum.
Power level scale
ScryCheck rates decks on a 1–10 scale with decimal precision. Your rating reflects overall deck strength across all five vectors — not just one axis like combos or mana. The ± margin shows the confidence interval: it widens when there’s meaningful uncertainty (many unrecognized cards, weak archetype signal, boundary proximity) and narrows with strong coverage and a clear deck identity.
Top-end competitive Commander: maximal efficiency, compact wins, and robust protection across all axes.
Near-cEDH construction and pacing; viable at cEDH-adjacent tables but not top-tier meta.
Fast, consistent, and resilient; can present and protect strong wins in stronger pods.
High slot efficiency and tight construction; strong execution without the full cEDH profile.
Well-tuned synergy shell with consistent lines and solid fundamentals.
Reliable game plan with intentional card choices; moderate speed and interaction.
Meaningful upgrades and cleaner synergies, though consistency gaps remain.
Comparable to stock precons: clear plan, but slower mana and lower card efficiency.
Core idea exists, but ramp, draw, and interaction still need major refinement.
Little strategic cohesion; mana and card quality are very inconsistent.
Commander brackets
ScryCheck estimates your deck’s Commander Bracket based on the official WotC guidelines . Brackets reflect both what’s in your deck and how powerful it is overall.
How bracket assignment works:
Certain high-impact cards act as hard gates — they set a minimum bracket floor regardless of overall power level. Power level can push your bracket above that floor, but never below it. A deck with no hard gates is bracketed purely by power level.
Fully optimized competitive decks. All axes are maximized. Power level 9 or higher.
Many Game Changers, mass land denial, or deterministic 2-card combos. Strong, focused, and often faster than casual tables expect.
A few Game Changers, or any 2-card combo. Tuned and efficient. Most power-7 decks live here.
Upgraded precons and synergy-focused builds. No Game Changers. Clearly more powerful than a stock precon.
Precons and very casual decks. Theme over optimization.
Game Changers
Game Changers are cards identified by WotC as having outsized impact on Commander games. ScryCheck detects all of them and uses their presence to set your bracket floor — the more you have, the higher the minimum.
They fall into four categories: fast mana that dramatically accelerates your development, efficient tutors that find any card with minimal investment, resource engines that generate overwhelming card or mana advantage, and stax and denial pieces that lock opponents out.
The full official Game Changer list is published by WotC and available on the Commander Brackets page
Combo detection
ScryCheck uses the Commander Spellbook database to detect combos — 80,000+ documented variants, from 2-card lines to complex setups. Each detected combo contributes to your Threats score.
Card count
Compact combos are weighted more heavily than long ones. A 2-card combo is harder to disrupt and faster to assemble than a 5-card setup — the scoring reflects that.
Outcome type
Combos are classified by what they do: instant wins, deterministic kills, infinite resource generation, or repeatable value. More immediately game-ending outcomes score higher.
Redundancy
Combos that share pieces with other combos in your deck receive a bonus. Overlapping lines are more consistent — drawing one piece gets you closer to multiple wins.
Outcome hints
Each combo row in your results shows an icon indicating what the combo does — mana, turns, damage, locks, wins, and more. Hover the icon for details sourced directly from Commander Spellbook.
To prevent permutation inflation, overlapping combo lines are grouped into families with a cap on their combined contribution to your score.
Input formats
Paste list
One card per line. Quantities are optional. Commander can be in the list or entered separately.
Sol Ring 1 Mana Crypt 4 Island Demonic Tutor
Import URL
Paste a link directly from:
- • Moxfield (moxfield.com/decks/...)
- • Archidekt (archidekt.com/decks/...)
How accuracy is measured
Power level is one of the hardest things to measure in Commander because there’s no objective ground truth — the same deck can feel like a 6 or an 8 depending on who you ask. Here’s how ScryCheck approaches measurement honestly.
Reference deck validation
ScryCheck validates against a set of 252 reference decks with known expected power levels and brackets. Every scoring change is measured against this suite before shipping.
202 of 252 reference decks are low-confidence estimates (database-mined), not independently verified. The validation numbers are real, but the ground truth itself carries uncertainty.
LLM cross-validation
To check whether the engine’s accuracy is self-referential, we ran all 252 reference decks through Claude and GPT-4o independently — neither model saw our scores, ground truth values, or thresholds. They evaluated each deck from scratch using only the card list and WotC’s bracket framework.
What the cross-validation found:
- • The engine outperformed both LLMs significantly — 93% bracket accuracy vs. 48% (Claude) and 38% (GPT-4o). Rule-based scoring is better calibrated than LLM judgment alone for this task.
- • LLMs systematically over-rate decks — they struggle to recognize how casual true Bracket 1–2 decks really are.
- • 6 reference decks were identified as miscalibrated — cases where the engine and both LLMs agreed the expected bracket was wrong. Fixing these improved measured accuracy from 90% to 93%.
The engine’s accuracy is not circular. An independent evaluation system with no access to our ground truth agreed with the engine more than it disagreed — and where it disagreed, it pointed to real ground truth errors, not engine errors.
Why measurement is hard
Self-assessment bias
Players systematically underestimate their decks — especially near the B3/B4 boundary. ScryCheck measures the deck as constructed, not as the player perceives it.
The ground truth problem
Any validation system needs ground truth to measure against. Ours comes from database-mined estimates, community assessments, and expert review — and the quality of the ground truth limits the quality of the accuracy claim.
Card quality vs. table impact
ScryCheck measures what’s in the deck. It can’t account for piloting skill, local meta, or political dynamics.
The bracket boundary problem
Brackets are discrete (1–5) but power level is continuous. Any deck near a boundary will feel “wrong” to someone. The confidence interval and bracket boundary note in your results communicate this uncertainty explicitly.
Continuous improvement
ScryCheck’s scoring engine is validated automatically on every update. The reference deck suite is reviewed quarterly, and user feedback is aggregated to detect systematic biases.
If you think your deck is significantly miscalibrated, use the feedback button after re-analyzing — it captures your expected bracket and routes directly to our calibration review queue.
FAQ
Why is my deck rated lower/higher than expected?
Power level is subjective. ScryCheck focuses on card quality, combo potential, and mana efficiency. A deck can feel powerful due to piloting skill or local meta, even if the raw card analysis suggests otherwise. The ± margin shows our confidence range.
What does “Confidence” mean?
Confidence reflects model certainty, not just card recognition. It combines coverage with signals like archetype clarity, how many cards were categorized by baseline heuristics only, and deck completeness.
What does the ± margin mean?
The ± margin is the confidence interval around your power level. It widens when uncertainty signals are present — for example, boundary proximity, weak archetype confidence, or many unrecognized cards — and narrows when coverage and rating quality are strong.
Why is my mana base score low or negative?
The mana base score reflects land quality relative to format expectations — not mana acceleration (that's Speed). A tapland-heavy mana base scores low and can cap your maximum power level rather than producing a negative score directly.
How are decks stored?
When you analyze a deck, results are saved with a unique hash. Decks with identical card lists share the same URL, making it easy to share your analysis. You can re-analyze at any time to get updated scores.
Why doesn't my commander show synergy bonuses?
Commander synergy detection covers 100+ popular commanders with dedicated profiles. If your commander isn't profiled, the deck is still analyzed normally — you just won't see commander-specific bonuses.