Using a model of analysts’ judgments to augment an item calibration process
Educational and Psychological Measurement, 75(5), 826–849.
When conducting item reviews, analysts evaluate an array of statistical and graphical information to assess the fit of a field test (FT) item to an item response theory model. The process can be tedious, particularly when the number of human reviews (HR) to be completed is large. Furthermore, such a process leads to decisions that are susceptible to human errors. A key finding from behavioral decision-making research has shown that a parametric model of human decision making often outperforms the decision maker himself. We exploit this finding by seeking a model to mimic how analysts integrate FT item level statistics and graphical performance plots to predict the analyst’s assignment of the item’s status. The procedure suggests a set of rules that achieves a desired level of classification accuracy, separating situations in which the evidence supports firm decisions from those situations that would likely benefit from HRs. Implementation of the decision rules accounts for an estimated 65% reduction in calibrations requiring HRs.See More
This article was published outside of NWEA. The full text can be found at the link above.
Topics: Measurement & scaling