For employers

How to score and evaluate asynchronous interviews

A real scoring rubric for one-way video interviews: trait-anchored ratings, how to calibrate across reviewers, and why a human still makes the final call.

Updated June 12, 2026 9 min read

Scoring an asynchronous interview means rating every candidate’s recorded answers against a written rubric you set before you watch. You pick a few traits the role needs, define what each score looks like, and rate one question at a time across the whole pool. Software can transcribe and organize. A human makes the final call.

That is the short answer. The rest of this page is how to do it well, because the format gives you something a phone screen cannot: the same questions, answered under the same conditions, from every candidate. You throw that away the moment you score on gut feel, because then you have rebuilt the bias the format was supposed to remove. Here is how to score one-way video answers so the result is consistent, fair, and defensible.

The short version

Write the rubric before you watch a single answer. Pick two to four traits the role actually needs, and define in plain language what a weak answer and a strong answer look like for each. Rate every candidate against those anchors on a 1 to 4 scale, one question at a time across the whole pool. Calibrate reviewers on shared samples first. Let software transcribe and organize. Keep a human making the final call. Save the scorecards.

Score against a rubric, not a vibe

A rubric is just the answer to one question, written down before you start: what does a good answer to this look like? Decide that in advance and you are comparing every candidate to a fixed standard. Skip it and you are comparing each candidate to whoever you watched right before them, which is not a standard at all.

The most common failure here is not malice. It is drift. Watch forty answers in a row and your bar moves without you noticing. The fifth candidate and the thirty-fifth get judged by different people, even though that person is you. The rubric is what holds the bar still.

This is also the difference an asynchronous interview is supposed to deliver in the first place. The widely cited finding in hiring research is that structured, scored interviews tend to predict job performance more reliably than unstructured ones, because everyone is measured on the same things. A one-way interview hands you the structure for free. The scoring is the part you still have to build.

Pick the traits before you write the questions

Name the two to four things this stage needs to prove. Not ten. Two to four. For a support role that might be written clarity, empathy, and judgment under pressure. For a sales role, discovery instinct and how someone handles an objection. For a nurse, communication with a worried patient and how they think through a safety call.

Each trait you keep is a trait every reviewer has to hold in their head and rate honestly. Past four, the scoring gets mushy and reviewers start collapsing everything into a single “do I like them” number. Cut anything that does not separate “worth a live conversation” from “not yet.” You are not making the final hire here. You are deciding who earns the call.

Every question in the interview should map to one of these traits. If a question does not, it is not earning its place, and it is one more thing for a candidate to record for no reason. See asynchronous interview questions and examples for prompts you can adapt by role.

Anchor each rating so a 3 means the same thing to everyone

A number on its own is useless. “I gave that a 3” means nothing unless 3 has a definition. Trait-anchored ratings fix that: for each trait, you write a short description of what each score on the scale actually looks like.

Use a 1 to 4 scale. Three or four points is enough to separate weak from strong without pretending you can tell a 6 from a 7. A wider scale feels more precise and is actually less reliable, because reviewers will not apply the middle of a ten-point scale the same way twice.

Here is the shape of an anchored trait, using “judgment under pressure” as the example:

1, weak. Jumps to the first action without weighing options. No sign they considered the downside or who else is affected.
2, below bar. Reaches a reasonable action but skips the reasoning. You cannot tell if they thought it through or got lucky.
3, solid. Names the tradeoff, picks a defensible action, and explains why. The kind of answer you would be glad to hear in the role.
4, strong. All of the above, plus anticipates what could go wrong with their own choice and what they would watch for next.

Write four of these, one per trait, and you have a rubric. Every reviewer now scores against the same written standard instead of a private one in their head. When you are ready to put this on paper, the employer scorecard template gives you a clean grid to fill in.

Score one question at a time, across the whole pool

This is the single most useful habit, and almost nobody does it. Do not watch candidate A’s full interview, score them, then move to candidate B. Instead, watch every candidate’s answer to question one, score all of those, then move to question two.

Two reasons. First, it kills the halo effect: a brilliant answer to question one stops inflating the score you give that same person on question three. Second, it keeps the scale steady, because you are comparing like answers back to back while the standard for “good” is fresh. Reading by question is faster too. You get into a rhythm on one trait instead of context-switching between four traits per candidate.

Calibrate reviewers before they score for real

If more than one person scores, their numbers have to mean the same thing, or the rubric is decoration. Calibration is how you get there, and it takes about twenty minutes.

Pick two or three real sample answers that span the range. One clearly strong, one clearly weak, one genuinely borderline.
Have every reviewer score those independently, without talking.
Compare. Where you agree, good. Where you split by more than a point, that is the gold.
Talk through the splits. You will usually find the anchor wording is too loose, or two reviewers are weighting the same answer differently. Tighten the words until you would all land in the same place.
Re-check on the borderline sample. Then start scoring the real pool.

Disagreement at this stage is not a problem to vote away. It is the rubric telling you exactly where it is still vague. Fix it now and every score after is more trustworthy. For borderline candidates in the live pool, a second independent score is worth the few minutes it costs.

Where AI helps, and where it should not

Software is genuinely good at the work surrounding the decision. It can transcribe every answer so you read at your own pace instead of watching in real time. It can organize responses by question, search the transcripts, and flag where an answer touches the criteria you set. On a pool of two hundred, that is hours back.

What it should not do is make the call. There are two reasons, and they are different.

The first is quality. Judgment about whether someone is right for a specific role on a specific team does not transfer cleanly to a model. Some recruiters have noticed the limit firsthand. One wrote that companies who went all-in on automated screening “are now coming back asking for help finding actual humans who can spot the difference between genuine experience and ChatGPT-generated fluff.” A rubric in a person’s hands catches things a score cannot.

The second is defensibility, and it is the one that bites later. A decision a human made, against a written job-related standard, on a record you kept, is one you can stand behind. There is also real candidate wariness here you should not ignore. On Reddit, a recruiter said of their own employer’s setup, “My company already does AI scored one way video interviews and I hate them… I suspect a higher impact on under represented candidates but accusing that is opening Pandora’s box.” A candidate raised the same worry about HireVue-style scoring of “eye contact, body language,” wondering about bias “especially as it relates to neurodiverse and international candidates.” Use AI to surface evidence. Keep the deciding with a person who is reading the answers that matter.

Keep the record, for fairness and for the lawyers

Scoring everyone against the same written rubric is not just good practice. It is your defense if a hiring decision is ever questioned. The EEOC’s 2023 guidance on AI in hiring made the point plainly: Title VII applies to AI-assisted selection, and a procedure that disproportionately screens out a protected group can create disparate-impact liability unless you can show it is job-related and justified by business necessity. None of this is legal advice, but the direction is clear.

A consistent, job-related, scored process is the core of that defense. So:

Keep the rubric you scored against, with its anchors.
Keep the completed scorecards, including the borderline ones.
Make sure the traits you rate are plainly tied to the job, not to how polished someone looks on camera.
Note who made the final decision and what it rested on.

None of this is heavy if you build it in from the start. It is heavy only if you try to reconstruct it after the fact. Treat the scorecard as the artifact you keep, not the thing you throw away once the shortlist is set.

Score fast, while it counts

The reason to move the first screen to an asynchronous format is speed. Do not surrender it at the scoring step. Read transcripts at your own pace, score by question across the pool, and move the shortlist into live interviews within a day or two. Completion is already a cost to candidates. One recruiter at a large company reported a 50 percent take rate on their one-way stage, so the least you can do is review what people sent you promptly. A fast, scored, respectful screen is a competitive advantage. A pile of recorded answers sitting unwatched for two weeks is worse than the phone screen you replaced.

When you have the rubric working, asynchronous video interview best practices covers the rest of the operating model, from question design to the candidate experience around it.

Frequently asked questions

What is the best way to score an asynchronous interview?

Write the rubric before you watch anyone. Pick two to four traits the role actually needs, define what a 1 and a 4 look like for each in plain language, and rate every answer against those anchors. Score one question at a time across all candidates, not one candidate at a time, so the scale stays steady.

How do you keep scoring fair across multiple reviewers?

Calibrate. Have every reviewer independently score the same two or three sample answers, then compare. Where you disagree by more than a point, talk it out and tighten the wording of the anchor until you would land in the same place. Disagreement is information, not a tie to break.

Should AI score the interview for you?

AI is good at the parts around the decision: transcribing answers, organizing them, flagging where a response matches your criteria. It should not be the thing that decides who advances. A person reading against the rubric makes the call, both because judgment does not transfer well to a model and because a human-made decision is the defensible one.

How many points should an interview rating scale have?

Three or four. A 1 to 4 scale is enough to separate a weak answer from a strong one without false precision. Avoid a 1 to 10 scale: the difference between a 6 and a 7 is noise, and reviewers will not apply it consistently.

How do you make async interview scoring legally defensible?

Score everyone against the same written, job-related rubric, keep the completed scorecards, and have a human make and record the decision. A consistent process tied to the job is the core of the business-necessity defense if a hiring decision is ever challenged.