Day 332

Week 48 Day 3: How to Score Candidates Objectively

Before the interview: define a 1-5 scale for each criterion with specific behavioral anchors. A 5 is not 'great answer' -- it is 'candidate provided a detailed example demonstrating the behavior in a high-stakes context with measurable results and clear evidence of their specific contribution.' A 3 is not 'okay' -- it is 'candidate provided an example that partially demonstrates the behavior but lacked specificity on their personal contribution or the outcome.' After the interview: score each criterion within 15 minutes, before discussing with other interviewers.

Here is the complete scoring discipline. Before the interview panel begins, align on three things. First -- the criteria: every interviewer knows exactly which behaviors, skills, and culture add factors they are evaluating. Different interviewers can evaluate different criteria, but every criterion must be assigned to at least one interviewer. Second -- the questions: standardize the questions for each criterion so that every candidate is evaluated on the same prompts. You can ask follow-up questions to probe deeper, but the primary question for each criterion should be identical across candidates. Third -- the scoring rubric: define what a 1, 2, 3, 4, and 5 look like for each criterion. 1 = no evidence of the behavior or skill (the candidate could not provide an example or the example did not demonstrate the target behavior). 2 = weak evidence (the candidate provided an example but it was vague, they described a team outcome without clarifying their individual role, or the example was from a significantly less complex context than the one they would face in this role). 3 = adequate evidence (the candidate provided a clear example that demonstrated the behavior in a relevant context, but the outcome was modest or the candidate's personal contribution was unclear). 4 = strong evidence (the candidate provided a detailed example with clear context, specific actions they took, measurable results, and evidence of learning or iteration). 5 = exceptional evidence (the candidate provided an example that demonstrated the behavior at a level above the role's requirements, with impact beyond their immediate scope, clear evidence of strategic thinking, and outcomes that significantly exceeded expectations). During the interview: take notes on specific statements the candidate makes that are relevant to each criterion. After the interview: score each criterion within 15 minutes based on your notes. Do not discuss scores with other interviewers until everyone has submitted their independent scores. This independence is critical -- once you hear another interviewer's score, your own score shifts toward theirs (anchoring effect). After independent scoring is complete, meet as a panel to discuss. Start with the criteria where scores diverge most, because that is where the most useful calibration will occur. The interviewer who scored high and the interviewer who scored low discuss what they observed and why they scored differently. Often, the divergence reveals that one interviewer heard something the other missed, or that one interviewer weighted a factor that the other considered irrelevant.

The independent scoring protocol addresses what social psychology researchers call 'groupthink' (Janis, 1972) and 'social conformity' (Asch, 1951) -- the well-documented tendency for individuals to adjust their judgments toward the group consensus, particularly when the first opinion expressed comes from a high-status individual (such as the hiring manager). Research by Gigone and Hastie (1993) on 'the common knowledge effect' found that group discussions disproportionately weighted information that was already held by multiple members while underweighting unique information held by individual members, meaning that panel discussions without independent scoring tend to converge on the consensus view while losing the unique observations that made panel interviews valuable in the first place. The 15-minute scoring window implements what cognitive psychologists call 'immediate retrospective reporting' (Ericsson and Simon, 1993) -- their research found that evaluative accuracy decreases by approximately 25% for every hour of delay between observation and rating, because the evaluator's memory becomes increasingly reconstructive (shaped by post-observation information and reflection) rather than veridical (based on the actual observation). The divergent-score discussion protocol implements what Surowiecki (2004) calls the 'wisdom of crowds' conditions -- specifically, the requirement for diversity of opinion and independent judgment followed by aggregation. When these conditions are met, the group's collective evaluation is more accurate than any individual evaluator's assessment. When these conditions are violated (as when interviewers discuss scores before independent submission), the group evaluation degrades toward the most influential voice rather than the most accurate assessment.

Lesson Locked

Continue Reading