Making sense of standard error of measurement

If you want to track student progress over time, it’s critical to use an assessment that provides you with accurate estimates of student achievement, that is, assessment with a high level of precision. When we refer to measures of precision, we are talking about something known as the standard error of measurement (SEM).

Before we define SEM, it’s important to remember that all assessment scores are an estimate. That is, irrespective of the test being used, all observed scores include some measurement error, so we can never really know a student’s actual achievement level (their true score). But we can estimate the range in which we think a student’s true score likely falls; in general, the smaller the range, the greater the precision of the assessment.

SEM, put in simple terms, is a measure of the precision of an assessment, The smaller the SEM, the more precise the measurement capacity of the instrument. Consequently, smaller standard errors translate to more sensitive measurements of student progress.

Standard error of measurement and MAP Growth

On MAP® Growth™, student RIT scores are always reported with an associated SEM, with the SEM often presented as a range of scores around a student’s observed RIT score. On some reports, it looks something like this: Student Score Range: 185-188-191.

So what information does this range of scores provide? The middle number tells us that a RIT score of 188 is the best estimate of this student’s current achievement level. It also tells us that the SEM associated with this student’s score is approximately three RIT; this is why the range around the student’s RIT score extends from 185 (188 – 3) to 191 (188 + 3). A SEM of three RIT points is consistent with typical SEMs on MAP Growth, which tends to be approximately three RIT points for all students.

To ensure an accurate estimate of student achievement, it’s important to use a sound assessment, administer assessments under conditions conducive to high test performance, and have students ready and motivated to perform.

The observed score and its associated SEM can be used to construct a “confidence interval” to any desired degree of certainty. For example, a range of ± 1 SEM around the observed score (which, in the case above, was a range from 185 to 191) is the range within which there is a 68% chance that a student’s true score lies, with 188 representing the most likely estimate of this student’s score. Intuitively, if we specify a larger range around the observed score—for example, ± 2 SEM, or approximately ± 6 RIT—we would be much more confident that the range encompassed the student’s true score, as this range corresponds to a 95% confidence interval.

So, to this point, we’ve learned that smaller SEMs are related to greater precision in the estimation of student achievement, and, conversely, that the larger the SEM, the less sensitive our ability to detect changes in student achievement.

Why is this fact important to educators?

If we want to measure the improvement of students over time, it’s important that an assessment used be designed with this intent in mind. And to do this, the assessment must measure all kids with similar precision, whether they are on, above, or below grade level. Recall, a larger SEM means less precision and less capacity to accurately measure change over time, so if SEMs are larger for high- and low-performing students, this means those scores are going to be far less informative, especially when compared to those students who are on grade level. Educators should consider the magnitude of SEMs for students across the achievement distribution to ensure that the information they are using to make educational decisions is highly accurate for all students, regardless of their achievement level.

Grade 5 Reading SEM

An example of how SEMs increase in magnitude for students above or below grade level is shown in the figure to the right, with the size of the SEMs on an older version of the Florida 5th grade reading test plotted on the vertical axis relative to student scale scores on the horizontal axis. What is apparent from this figure is that test scores for low- and high-achieving students show a tremendous amount of imprecision.

In this example, the SEMs for students on or near grade level (scale scores of approximately 300) are 10–15 points, but they increase significantly for students the further away they get from grade level. This pattern is fairly common on fixed-form assessments, with the end result being that it is very difficult to measure changes in performance for those students at the low and high end of the achievement distribution. Put simply, this high amount of imprecision will limit the ability of educators to say with any certainty what the achievement level for these students actually is and how their performance has changed over time.

Of course, SEM isn’t the only factor that impacts the accuracy of a test. Accuracy is also impacted by the quality of testing conditions and the energy and motivation that students bring to a test. In fact, an unexpectedly low test score is more likely to be caused by poor conditions or low student motivation than a problem with the testing instrument. To ensure an accurate estimate of student achievement, it’s important to use a sound assessment, administer assessments under conditions conducive to high test performance, and have students ready and motivated to perform.

Blog post

Helping students grow

Students continue to rebound from pandemic school closures. NWEA® and Learning Heroes experts talk about how best to support them here on our blog, Teach. Learn. Grow.

See the post

Guide

Put the science of reading into action

The science of reading is not a buzzword. It’s the converging evidence of what matters and what works in literacy instruction. We can help you make it part of your practice.

Get the guide

Article

Support teachers with PL

High-quality professional learning can help teachers feel invested—and supported—in their work.

Read the article