Interim assessments offer many advantages, including the ability to gather and compare data that’s collected over time—both within a single year and over the course of multiple years. An interim assessment that provides accurate longitudinal data benefits students, teachers, and administrators in different ways, but chief among them is the ability to make meaningful comparisons.
Comparable data allow for a longitudinal perspective on student learning. This helps a teacher and student establish reasonable growth targets and provides context to understand a student’s current achievement status in terms of growth. Are students growing? What areas are seeing the most growth, and in which areas has growth seemed to plateau? This can provide a teacher with information on where to focus instructional energy and class time. Some of these questions can only be answered by having data that go back in time, or longitudinal data.
- Longitudinal data give school building administrators the ability to identify learning trends across groups of students, and those students can be flexibly grouped by subject, grade, or classroom teacher depending on what the school building administrators are analyzing. These trends may indicate that certain curricula, programs, or pedagogical approaches are more successful than others.
- District-level administrators are able to see growth trends district-wide, providing them with relevant data points around allocation of instructional resources, staffing, technology, and professional development. Longitudinal data allow for informed growth projections and predictive functions. This helps district level administrators know whether their students are on track for meeting progress goals; and if not, it gives them time to do something about it.
When data are compared between groups and over time, the stability of those data becomes a virtue. Let’s first examine the three kinds of data comparability: horizontal, vertical, and longitudinal.
Before data from different systems can be combined, compared, or aggregated, the data elements in all systems must be the same. They must:
- represent the same entity or attribute with the same definition
- be collected with a consistent method
When this doesn’t happen, comparability suffers. For example, under No Child Left Behind, each state had to define proficiency on their individual state summative assessment scales. Educators haven’t been able to compare the performance of students across states because neither the scales nor the cut points, nor the assessments are common to all states. In this case the data element, “proficient,” is useless for comparison beyond state borders.
Data elements which claim to represent the same entity or attribute should:
- have the same definition
- be calculated in the same way in different parts of vertical data systems
MAP represents one vertical system. The RIT scale is uniform across its range—a RIT is a RIT is a RIT– and student scores are calculated in the same way across the scale. It is this comparability that allows for comparison of students who take different courses in different grades or even the same grade. Vertical comparability is an important condition for being able to track student growth across the years.
The meanings of data elements can drift over time or they can be intentionally redefined. If the data are
to be compared or aggregated over time, though, it’s important to know when changes or drift have occurred. In April 1995, the College Board re-centered the scores on the SAT because student performance had shifted. Establishing 500 as the mean score—the midpoint on the 200-800 scale—made it easier for schools to interpret the scores. When the re-centering occurred, the College Board notified school districts and colleges throughout the nation that they couldn’t compare students’ SAT scores after the re-centering to the same scores achieved before the re-centering.
After the initial change, the College Board created conversion formulas to help schools adjust the old scores. By using the formulas, schools can compare old scores with the re-centered scores.
The NWEA RIT scale has proven stable over time, and periodically, we have conducted studies to check for scale drift.
Stay tuned for an upcoming post where we’ll share how changes in state standards put a strain on vertical and longitudinal comparability within an assessment system, and how NWEA Norms helps keep data in context. Data doesn’t take care of itself, but with careful stewardship it can be a lever for improved student learning outcomes.