Evaluating content alignment in computerized adaptive testing
Educational Measurement: Issues and Practice, 34(4), 41–18.
The alignment between a test and the content domain it measures represents key evidence for the validation of test score inferences. Although procedures have been developed for evaluating the content alignment of linear tests, these procedures are not readily applicable to computerized adaptive tests (CATs), which require large item pools and do not use fixed test forms. This article describes the decisions made in the development of CATs that influence and might threaten content alignment. It outlines a process for evaluating alignment that is sensitive to these threats and gives an empirical example of the process.See More
This article was published outside of NWEA. The full text can be found at the link above.
Topics: Measurement & scaling
There has been increasing concern about the presence of disengaged test taking in international assessment programs and its implications for the validity of inferences made regarding a country’s level of educational attainment. In this paper, the author discusses six important insights yielded by 20 years of research on this and implications for assessment programs.
By: Steven Wise
This study investigated test-taking engagement on a large-scale state summative assessment. Overall, results of this study indicate that disengagement has a material impact on individual state summative test scores, though its impact on score aggregations may be relatively minor.
This paper describes a method for identifying partial engagement and provides validation evidence to support its use and interpretation. When test events indicate the presence of partial engagement, effort-moderated scores should be interpreted cautiously.
To avoid the subjectivity of having a single person evaluate a construct of interest, multiple raters are often used. While a range of models to address measurement issues that arise when using multiple raters have been presented, few are available to estimate growth in the presence of multiple raters. This study provides a model that removes all but the shared perceptions of raters at a given timepoint then adds on a latent growth curve model across timepoints. Results indicate that the model shows promise for use by researchers who want to estimate growth based on longitudinal multi-rater data.
To avoid the subjectivity of having a single person evaluate a construct of interest (e.g., a student’s self-efficacy in school), multiple raters are often used. This study provides a model for estimating growth in the presence of multiple raters.
This research study is the first time of applying the thinking of semi-supervised learning into CDM. Also, we used the validating test to choose the appropriate parameters for the ANNs instead of using typical statistical criteria, such as AIC, BIC.
By: Kang Xue, Laine Bradshaw
Topics: Measurement & scaling
Comparability of MAP Growth tests administered through different technology and psychometric infrastructure: A simulation study
This report presents the results of a mode comparability study conducted through simulations to evaluate how scores from MAP Growth administered on the constraint-based engine (CBE) compare to those administered on the current MAP Growth engine known as COLO.
Products: MAP Growth