Six insights regarding test-taking disengagement
Educational Research and Evaluation, 26(5-6), 328-338
By: Steven Wise
There has been increasing concern about the presence of disengaged test taking in international assessment programs and its implications for the validity of inferences made regarding a country’s level of educational attainment. This issue has received a growing research interest over the past 20 years, with notable advances in both the measurement of disengagement as well as our understanding of its distortive impact on both individual and aggregated scores. In this paper, the author discusses six important insights yielded by this research and their implications for assessment programs.See More
This article was published outside of NWEA. The full text can be found at the link above.
This study investigated test-taking engagement on a large-scale state summative assessment. Overall, results of this study indicate that disengagement has a material impact on individual state summative test scores, though its impact on score aggregations may be relatively minor.
The more frequent collection of response time data is leading to an increased need for an understanding of how such data can be included in measurement models. Models for response time have been advanced, but relatively limited large-scale empirical investigations have been conducted. We take advantage of a large data set from the adaptive NWEA MAP Growth Reading Assessment to shed light on emergent features of response time behavior.
This paper describes a method for identifying partial engagement and provides validation evidence to support its use and interpretation. When test events indicate the presence of partial engagement, effort-moderated scores should be interpreted cautiously.
To avoid the subjectivity of having a single person evaluate a construct of interest, multiple raters are often used. While a range of models to address measurement issues that arise when using multiple raters have been presented, few are available to estimate growth in the presence of multiple raters. This study provides a model that removes all but the shared perceptions of raters at a given timepoint then adds on a latent growth curve model across timepoints. Results indicate that the model shows promise for use by researchers who want to estimate growth based on longitudinal multi-rater data.
To avoid the subjectivity of having a single person evaluate a construct of interest (e.g., a student’s self-efficacy in school), multiple raters are often used. This study provides a model for estimating growth in the presence of multiple raters.
This research study is the first time of applying the thinking of semi-supervised learning into CDM. Also, we used the validating test to choose the appropriate parameters for the ANNs instead of using typical statistical criteria, such as AIC, BIC.
By: Kang Xue, Laine Bradshaw
Topics: Measurement & scaling
Comparing different response time threshold setting methods to detect low effort on a large-scale assessment
This study uses reading test scores from over 728,923 3rd–8th-grade students in 2,056 schools across the US to compare threshold-setting methods to detect noneffortful item responses. and so helps provide guidance on the tradeoffs involved in using a given method to identify noneffortful responses.
Topics: School & test engagement