Assessment designers strive to create assessments that show a high degree of fidelity to the following five traits:
1. Content Validity
4. Student Engagement and Motivation
5. Consequential Relevance
The first post in our series focused on the importance of content validity, or ensuring that an assessment measures what it is intended to measure for its intended purpose. In this post, we’ll discuss a second trait of high-quality assessments: reliability.
Reliability refers to the consistency of the assessment results. It is the degree to which student results are the same when they take the same test on different occasions, when different scorers score the same task, and when different but equivalent tests are taken at the same time or at different times. Reliability is about making sure that different test forms in a single administration are equivalent; that retests of a given test are equivalent to the original test; and that test difficulty remains constant year to year, administration to administration.
Reliability of measurement is important. Case in point: yesterday, my 9-month old son had a fever. I need to be able to trust the measurement given by my thermometer in order to track whether his fever is getting worse or better. Similarly, when we use assessment data to help us make strategic instructional decisions, and track progress over time, we need to have a high-degree of confidence in the consistency or reliability of that measurement.
Whether its high-stakes assessments measuring end-of-course achievement, or assessments that measure growth – reliability is critical for any assessment that will be used to make decisions about the educational paths and opportunities of students. Reliability is a trait achieved through statistical analysis in a process called equating. Equating is one of the many behind-the-scenes functions performed by psychometricians, folks trained in the statistical measurement of knowledge.
In general, the informal, classroom based, teacher-created assessments do not directly engage with the concept of reliability, as these types of assessments do not require advanced statistical analysis; however, they do informally engage with the concept. When a student has to take a make-up test, for example, the test should be approximately as difficult as the original test. There are many such informal assessment examples where reliability is a desired trait. In fact, it is hard to conceive of a situation where reliability would not be a desired trait. The main difference is how it is tracked. For informal assessments, professional judgment is often called upon; for large-scale assessments, reliability is tracked and demonstrated statistically.
In our third post on characteristics of quality educational assessments, we will explore the need for fairness. In the meantime, please feel free to share your thoughts on what qualities a good educational assessment should have by dropping a comment below.