Excellent framing on construct validity! The LinkedIn sudoku comparision really drives it home. The point about instruction-following as a confounder is particulary sharp, feels like one of those blindspots that's hiding in plain sight across so many evals.
Excellent framing on construct validity! The LinkedIn sudoku comparision really drives it home. The point about instruction-following as a confounder is particulary sharp, feels like one of those blindspots that's hiding in plain sight across so many evals.