Measuring the impact of test disengagement on estimates of educational effectiveness
Learn more about our examination of student disengagement and how it may bias estimates of effectiveness based on observed test results.
View research brief
There has been increasing concern about the presence of disengaged test taking in international assessment programs and its implications for the validity of inferences made regarding a country’s level of educational attainment. In this paper, the author discusses six important insights yielded by 20 years of research on this and implications for assessment programs.
By: Steven Wise
This study investigated test-taking engagement on a large-scale state summative assessment. Overall, results of this study indicate that disengagement has a material impact on individual state summative test scores, though its impact on score aggregations may be relatively minor.
The more frequent collection of response time data is leading to an increased need for an understanding of how such data can be included in measurement models. Models for response time have been advanced, but relatively limited large-scale empirical investigations have been conducted. We take advantage of a large data set from the adaptive NWEA MAP Growth Reading Assessment to shed light on emergent features of response time behavior.
This paper describes a method for identifying partial engagement and provides validation evidence to support its use and interpretation. When test events indicate the presence of partial engagement, effort-moderated scores should be interpreted cautiously.
To avoid the subjectivity of having a single person evaluate a construct of interest, multiple raters are often used. While a range of models to address measurement issues that arise when using multiple raters have been presented, few are available to estimate growth in the presence of multiple raters. This study provides a model that removes all but the shared perceptions of raters at a given timepoint then adds on a latent growth curve model across timepoints. Results indicate that the model shows promise for use by researchers who want to estimate growth based on longitudinal multi-rater data.
To avoid the subjectivity of having a single person evaluate a construct of interest (e.g., a student’s self-efficacy in school), multiple raters are often used. This study provides a model for estimating growth in the presence of multiple raters.
This research study is the first time of applying the thinking of semi-supervised learning into CDM. Also, we used the validating test to choose the appropriate parameters for the ANNs instead of using typical statistical criteria, such as AIC, BIC.
By: Kang Xue, Laine Bradshaw
Topics: Measurement & scaling