Templates
One-way interview scorecard template
A copy-paste scoring grid for one-way video interviews. Trait-anchored, five to seven rows, with what each rating actually means so two reviewers land in the same place.
A one-way interview scorecard is a fixed grid for rating recorded answers. It lists one trait per row, each tied to a question, with a written anchor for what each score means and a cutoff that maps the total to a decision. The point is consistency: the same answer earns roughly the same score no matter who is watching.
A one-way interview produces a pile of recordings. Without a scorecard, each reviewer watches them and forms a private impression, and those impressions are where bias and inconsistency live. A structured scorecard fixes the standard in writing before anyone hits play.
This is the grid. Copy it, swap in your traits, and use it. The framing below explains how to customize each part.
When to use this
Use it the moment you have more than one candidate or more than one reviewer. A single person scoring a single candidate can hold a standard in their head. Two reviewers, or twenty candidates over two weeks, cannot. The scorecard is what keeps the last candidate judged by the same bar as the first.
It pairs with the question set. Each scored trait should map to a question you actually asked. If a trait has no question behind it, you are scoring a guess. If a question maps to no trait, cut it.
The scorecard
One row per trait. Five to seven rows. Each rated 1 to 5 against the anchors below, with one line of evidence in the candidate’s own words.
| Trait (and the question it maps to) | Rating (1-5) | Evidence: what they actually said |
|---|---|---|
| Communication: clear, structured, gets to the point (Q1) | ||
| Role-specific thinking: depth on the core task (Q2) | ||
| Problem solving: how they reason through the example (Q3) | ||
| Motivation / role fit: specific reasons, not generic ones (Q4) | ||
| Self-awareness: handles the harder question honestly (Q5) | ||
| [Add a sixth trait if the role needs it] | ||
| [Add a seventh trait if the role needs it] | ||
| Total (out of 25-35) |
Keep it to the traits that separate “worth a live call” from “not yet.” This is a screen, not the final decision. Seven rows is the ceiling. Past that, reviewers stop scoring and start skimming.
What each rating means
Do not score an unanchored 1 to 5. A bare number means something different to every reviewer. Write down what each level looks like, on each trait, before anyone watches. Here is a reusable anchor you can paste at the top of the sheet and adjust per trait:
5 = Strong. Specific, structured, role-relevant. A clear yes on this trait.
4 = Good. Solid answer with a minor gap. Leaning yes.
3 = Mixed. Some signal, some hand-waving. Genuinely borderline.
2 = Weak. Vague, generic, or off the point. Leaning no.
1 = Concerning. Misunderstood the question, or no real content. Clear no.
For each trait, replace the generic words with the real thing. Under Communication, a 5 might read “opens with the point, three tight sentences, no filler” and a 2 “rambles, never lands the answer.” The more concrete the anchor, the less two reviewers drift apart.
Why the evidence column matters
The third column is not optional. For every score, write one line of what the candidate actually said. Not “good answer” but the specific claim, number, or example they gave.
This does two things. It forces the score to come from the answer instead of a vibe, and it makes the decision defensible. If a candidate asks why they were passed, or a teammate disagrees, “scored a 2 on role-specific thinking, said only ‘I’m a hard worker’ when asked for a concrete example” is a real reason. A bare “2” is not.
Set the cutoff before you score
Decide what total advances a candidate before you watch a single recording. Picking the bar after you have seen the scores is how you talk yourself into the people you already liked.
A simple rule: advance anyone above your line, reject anyone well below it, and give the borderline band a second reviewer. You can also set a floor, where a 1 on a must-have trait is an automatic no regardless of the total. A great communicator who clearly cannot do the core task should not pass on points.
Use two reviewers for the borderline cases
For candidates near the line, have a second person score independently, before they see the first score. Then compare.
Close scores are a good sign. The anchors are working. A wide gap is information too: it usually means the anchor was vague, or the question was, and that is worth fixing before you reject anyone on a number two people could not agree on. Calibrating on a handful of answers early, where everyone scores the same three candidates and then talks through the gaps, tightens every score that follows.
Customize it, then keep it stable
Swap the traits for what the role demands. A sales screen weights discovery and objection handling. A support screen weights written clarity and empathy. A nursing screen weights judgment and a real example of patient communication. The structure holds, the traits change.
Then leave it alone for the duration of the role. The whole point is that every candidate meets the same bar. Rewriting the rubric halfway through the pipeline quietly resets that bar, and the people you scored first are no longer comparable to the people you score last.
For the reasoning behind each of these choices, and how scoring fits the rest of the process, read how to score async interviews and how to run an asynchronous interview.