Making Sense of Standard Error of Measurement

Making Sense of Standard Error of MeasurementIf you want to track student progress over time, it’s critical to use an assessment that provides you with accurate estimates of student achievement— assessments with a high level of precision. When we refer to measures of precision, we are referencing something known as the Standard Error of Measurement (SEM).

Before we define SEM, it’s important to remember that all test scores are estimates of a student’s true score. That is, irrespective of the test being used, all observed scores include some measurement error, so we can never really know a student’s actual achievement level (his or her true score). But we can estimate the range in which we think a student’s true score likely falls; in general the smaller the range, the greater the precision of the assessment.

SEM, put in simple terms, is a measure of precision of the assessment—the smaller the SEM, the more precise the measurement capacity of the instrument. Consequently, smaller standard errors translate to more sensitive measurements of student progress.

On MAP assessments, student RIT scores are always reported with an associated SEM, with the SEM often presented as a range of scores around a student’s observed RIT score. On some reports, it looks something like this:

Student Score Range: 185-188-191

So what information does this range of scores provide? First, the middle number tells us that a RIT score of 188 is the best estimate of this student’s current achievement level. It also tells us that the SEM associated with this student’s score is approximately 3 RIT—this is why the range around the student’s RIT score extends from 185 (188 – 3) to 191 (188 + 3). A SEM of 3 RIT points is consistent with typical SEMs on the MAP tests (which tend to be approximately 3 RIT for all students).

The observed score and its associated SEM can be used to construct a “confidence interval” to any desired degree of certainty. For example, a range of ± 1 SEM around the observed score (which, in the case above, was a range from 185 to 191) is the range within which there is a 68% chance that a student’s true score lies, with 188 representing the most likely estimate of this student’s score. Intuitively, if we specified a larger range around the observed score—for example, ± 2 SEM, or approximately ± 6 RIT—we would be much more confident that the range encompassed the student’s true score, as this range corresponds to a 95% confidence interval.

So, to this point we’ve learned that smaller SEMs are related to greater precision in the estimation of student achievement, and, conversely, that the larger the SEM, the less sensitive is our ability to detect changes in student achievement.

Why is this fact important to educators?

If we want to measure the improvement of students over time, it’s important that the assessment used be designed with this intent in mind. And to do this, the assessment must measure all kids with similar precision, whether they are on, above, or below grade level. Recall, a larger SEM means less precision and less capacity to accurately measure change over time, so if SEMs are larger for high- and low-performing students, this means those scores are going to be far less informative, especially when compared to those students who are on grade level. Educators should consider the magnitude of SEMs for students across the achievement distribution to ensure that the information they are using to make educational decisions is highly accurate for all students, regardless of their achievement level.

Grade 5 Reading SEMAn example of how SEMs increase in magnitude for students above or below grade level is shown in the figure to the right, with the size of the SEMs on an older version of the Florida 5th grade reading test plotted on the vertical axis relative to student scale scores on the horizontal axis. What is apparent from this figure is that test scores for low- and high-achieving students show a tremendous amount of imprecision. In this example, the SEMs for students on or near grade level (scale scores of approximately 300) are between 10 to 15 points, but increase significantly for students the further away they get from grade level. This pattern is fairly common on fixed-form assessments, with the end result being that it is very difficult to measure changes in performance for those students at the low and high end of the achievement distribution. Put simply, this high amount of imprecision will limit the ability of educators to say with any certainty what the achievement level for these students actually is and how their performance has changed over time.

Of course, the standard error of measurement isn’t the only factor that impacts the accuracy of the test. Accuracy is also impacted by the quality of testing conditions and the energy and motivation that students bring to a test. In fact, an unexpectedly low test score is more likely to be caused by poor conditions or low student motivation than to be explained by a problem with the testing instrument. To ensure an accurate estimate of student achievement, it’s important to use a sound assessment, administer assessments under conditions conducive to high test performance, and have students ready and motivated to perform.

For access to this article and other articles that describe additional vital assessment components, download free our eBook – Assessments with Integrity: How Assessment Can Inform Powerful Instruction.

We’d love to hear from you. Please join the conversation on the NWEA Twitter and Facebook channels!