Earlier we wrote a blog that put the three assessment types into context. It outlined the differences between interim, summative and formative assessments and when each is best served. In short, formative assessment guides learning; summative assessment certifies learning; and interim assessment guides and tracks learning.
If you’ve followed NWEA at all you’ll note that we often discuss multiple measures. Multiple measures span all the instruments and sources used to gather information for the purpose of making an informed educational decision. The decision might be low stakes, such as whether more instructional time is needed on the current topic. Other decisions might be higher stakes, such as inclusion in special programs or courses. These sources cover a wide spectrum: they may be teacher-made and classroom-specific, or they may be psychometrically validated assessments with explicit implementation and data-collection protocols. Multiple measures should certainly be used when making high-stakes decisions that impact someone’s life, such as college admissions or teacher tenure.
Here’s a sampling of possible measures that can be used to make informed decisions:
- teacher observation
- principal observation
- formative assessment strategies
- research papers
- student projects
- course grades
- teacher-made tests
- end of unit tests
- interim assessments
- skills diagnostics
- universal screeners
- peer assessments
- progress monitoring tools
- state accountability tests
- aptitude tests
- behavioral measures (such as attendance)
- grade point average
- class rank
No single system uses all of the measures available at all times. In addition to being overwhelming to implement, it wouldn’t effectively answer different educational questions such as “Are all second graders on track to be grade-level readers by the end of third grade?” and “Which second graders understand phonemes?” By using the list above to create customized sets of measures (inputs), however, teachers and administrators can get the answers they seek.
While there are many good reasons to use multiple measures, there’s also a fundamental reason to do so: all measures have some degree of error or bias. It might be acceptable to measure something like height or weight using one instrument, say a yardstick or bathroom scale, because these measurements rarely lead to any real consequences. Generally, if the measurement isn’t exact it will be close enough. It provides a reasonable estimate of the individual’s height or weight. When making inferences about a construct as complex as the acquisition of knowledge, however, there’s no one instrument that will give a “close enough” picture.
In considering the use of multiple measures, it’s important to consider the principle of triangulation. Triangulation is simply the process of using at least three points of data when making educational decisions. Any single interim assessment score is subject to environmental or motivational influences which can affect its accuracy.
School personnel have a wealth of data available about their students. A teacher might look at a score from an interim assessment, see something puzzling, and decide to triangulate it with data from the attendance system and from a diagnostic test. This will help confirm an inference about the student—“Ah, a string of absences when we covered this likely caused the problem, as the diagnostic data doesn’t identify any issues.” The teacher might then decide that re-teaching would be a viable strategy to close the gap.
Assessment scores and the associated reports and resources provided can be extremely useful tools for informing instruction and making administrative decisions, but they should not be used in exclusion to other data sources. Using triangulation assures that the most informed and appropriate decisions are being made on behalf of each student.