Most scorecards are written for humans and used by nobody. They sit in the Notion doc, the interviewer ignores them, the feedback is written without reference to them. The scorecard is a ceremonial artifact.
For an AI to actually use a scorecard, three things have to be true.
The competencies have to be observable. "Strong communication" is not observable. "Can explain a technical concept to a non technical stakeholder in under two minutes" is observable. The first is a vibe. The second is a behavior.
The rubric levels have to be distinguishable. A scale of 1 to 5 with no behavioral anchors collapses to a 3 in practice. Every candidate ends up average. A scale with concrete behavioral descriptions at each level forces real judgment.
The mapping between questions and competencies has to be explicit. Interviewers should know which competency each question is meant to assess. Otherwise they ask their favorite questions and tag the answers afterward, which is meaningless.
When these three are in place, AI can do the heavy lifting. It can tag interview evidence against rubric items automatically. It can flag rubric items that no question touched. It can show the hiring manager exactly where the evidence supports the final score.
Platforms like Mazle generate scorecards from intake calls in exactly this format. The scorecard, the question set, and the rubric levels are produced together so the mapping is built in.
A scorecard the AI cannot use is a scorecard the humans will not use either. The two failure modes have the same root cause. Make the thing observable, distinguishable, and mapped.