Data Comparability, Norms, and Adjusting for New State Standards

The Efficient Way to Inform Educational DecisionsEarlier this year, I wrote a blog on the importance of data comparability in today’s assessment environment. With the passage of the Every Student Succeeds Act (ESSA), many states are likely to be changing their summative assessment.  This shift follows several years of changing standards and tests.  All this activity highlights the role of data comparability in making sense of student achievement over time.

The changing assessments make it difficult to answer these questions with any degree of confidence:

  • Are my students progressing toward their learning goals?
  • How are my students performing compared to how they were performing at this time last year?
  • How are my students performing compared to their peers?
  • Are my students showing optimal growth?
  • How does my students’ growth compare to the growth of their peers?

At the most basic level, answering these questions requires the ability to compare a student’s scores over time. That is, the scores must be comparable both vertically and longitudinally—across the scale and over time.  Answering some of these questions involves comparing a student’s performance to that of other students who share certain demographic similarities, such as grade level. To accomplish this, assessment providers create norms that represent the aggregate responses of a representative group of students.

Fortunately, MAP® and MAP for Primary Grades® (MPG®) provide the comparability across the scale and over time that allow for meaningful estimates of student progress.  And NWEA norms allow for meaningful comparisons across students through status and growth in percentiles. Educators can use our norms to compare the performance of individual students or a class to that of the national sample. This comparative analysis provides one kind of data point that helps educators understand student performance in a larger context.  A full discussion of our norms and use cases is here.

We also carefully construct MAP norms to be independent of any specific state test. Most tests would need to have new norms calculated when the test is redesigned, or realigned, because norms are tied to answers to an existing test. Because we score MAP using Item Response Theory (IRT), however, and because we calibrate test items to a stable scale, MAP doesn’t require new norms when alignments are created to new standards. Educators who use MAP always have important contexts for data interpretation and evaluation.

MAP tests are based on pools of items that span RIT (for Rasch Unit) ranges and goal areas and are aligned to standards in the sense they only cover content in the standards. The only effect a new set of standards has on items is a redefining of the scope and contents of the item pool. To the degree that new standards add or subtract content from previous standards, the items in the pool “aligned” to the new standards will differ. In any psychometric sense, any two MAP pools are equivalent and yield the same results, as would two different yard sticks if one were plastic and the other wood. This equivalence means we don’t need to create new norms. Educators can compare student scores even when standards (and thus MAP tests) change. Then even when a student moves and takes MAP in a different state, growth trends persist.

Because our RIT scale has decades of stability, we can provide comparable growth and status data across 30 years and across all 50 states. This level of comparability and stability permits educators to use NWEA assessments as a bridge between prior and current standards. Even as data from old and new assessments become useless for longitudinal analysis, MAP and MPG constitute a consistent measuring device. States and the assessment consortia creating state tests aren’t making an attempt to create comparability between the old tests and new tests, and to do so would be futile. Accordingly, educators and the public must rely on a third party to create the unbroken data stream that will identify whether— and how—the implementation of higher standards, revised curricula, and new assessments is changing student performance.

Overall, data comparability lets teachers, administrators, parents, and students make important connections, recognize growth patterns and trends, develop achievable growth projections, and compare groups of students. Maintaining assessment data so that it can be compared vertically, horizontally, and longitudinally is one of the challenges that comes with such a data-rich culture, as well as changing assessment and state standards landscapes.