Comparing different response time threshold setting methods to detect low effort on a large-scale assessment
Large-scale Assessments in Education 9, 8 https://doi.org/10.1186/s40536-021-00100-w
Low examinee effort is a major threat to valid uses of many test scores. Fortunately, several methods have been developed to detect noneffortful item responses, most of which use response times. To accurately identify noneffortful responses, one must set response time thresholds separating those responses from effortful ones. While other studies have compared the efficacy of different threshold-setting methods, they typically do so using simulated or small-scale data. When large-scale data are used in such studies, they often are not from a computer-adaptive test (CAT), use only a handful of items, or do not comprehensively examine different threshold-setting methods. In this study, we use reading test scores from over 728,923 3rd–8th-grade students in 2,056 schools across the United States taking a CAT consisting of nearly 12,000 items to compare threshold-setting methods. In so doing, we help provide guidance to developers and administrators of large-scale assessments on the tradeoffs involved in using a given method to identify noneffortful responses.See More
This article was published outside of NWEA. The full text can be found at the link above.
Topics: School & test engagement