As a researcher at NWEA, I often answer questions about how to measure and interpret growth using the Measures of Academic Progress® (MAP®) assessment. Measuring student growth with MAP is a lot like measuring growth in height with a tape measure. A student gets measured once, then after an appropriate amount of time, gets measured again. If the student’s measured height (or MAP score) has changed over that time period, then that change is interpreted as growth.
What adds an additional layer to measuring growth on MAP, however, is that MAP scores are reported along with standard errors of measurement (or SEMs), which convey the margins of error for the measured scores. For example, if I produced a RIT score of 214 (SEM = 3.2) in the fall, my “true score” might have been a bit lower than 214 or a little bit higher. The odds are that it fell somewhere in the range of 214±3.2, but I can’t know exactly where. Imagine, further, that I produced a winter RIT score of 212 with an SEM of 3.1. Again, the “true score” can’t be exactly known, but it probably falls within the range of 212±3.1.
So, then, did I grow? The easy answer is that I lost 2 RIT points, because the difference between my measured scores in winter and fall was -2 points. But since there are margins of error on the observed scores, there must be a margin of error for my observed loss, too.
Just as NWEA reports show the SEMs associated with each measured score, so, too, do they provide the standard errors associated with measured growth. Growth standard errors are related to the individual score SEMs, so the smaller those are, the smaller will be the growth standard error. In the example above, my growth standard error would be computed as , √3.2² + 3.1² or about 4.46. This means that my “true growth” most likely falls within -2±4.46, or between the range of -6.46 and 2.46 RIT points.
This can be represented visually with a bell curve, as in the figure below, in which the red shaded area represents the range of ±1 standard error around the observed change (the vertical line at -2):
The bell curve displays the possible range of outcomes for my “true growth”. The red shaded area constitutes 68% of the total area under the bell curve, which is another way of saying that we can be 68% certain that my “true growth” falls within that range. But what is the possibility that my “true growth” was actually positive? Visually, that might look like this:
This red shaded area (about 33% of the total area under the curve) represents the possibility that my “true growth” was greater than zero, or positive. In other words, there is about a 33% chance that my “true score” actually increased, despite the fact that my observed scores dropped by two points. There is also about a 67% chance that my “true growth” was negative.
This information is useful in multiple ways. First, it highlights the fact that MAP growth contains an element of measurement error, which can be expressed as probabilities. In the example above, we saw that my “true” growth score had about a 33% of being positive (or that I gained ground) and about 67% of being negative (i.e., that my true score dropped). These kinds of statements are possible for all assessments, of course, but with MAP, the magnitude of error for most students is relatively small compared to fixed form, or to shorter length adaptive tests. And the smaller the error, the more precisely can we measure small amounts of real student growth.
Another reason this is helpful is because it helps to correctly understand and interpret negative growth. In my example above, the observed change (-2 points) was small, relative to the growth standard error. This meant that there was considerable uncertainty (about a 33% chance) that the observed loss was not real. In general, when observed changes are very large relative to their standard errors, we can be much more confident that they are real, and not simply artifacts of measurement error. When changes are small relative to their standard errors, we are much less confident that they are real.
All of the examples given so far rely upon a basic assumption, which is that the testing conditions and environment remain consistent across time. If we wanted to measure how much a student’s height increased during a school year, we wouldn’t measure them in bare feet in the fall, and then again wearing 3 inch platform shoes in the spring. If we did, we could not be confident that the observed changes in student height were due solely to physical growth. The same principle applies when measuring growth in achievement.
This is a particularly important factor to consider when understanding and interpreting unusually large increases or decreases in student MAP scores over time. In many cases, when a student shows an unexpected drop in score between two test events, the change can be attributed to other factors. Did the student spend too little time on the test? Was the student actively engaged, or was he/she responding randomly to the items? The MAP system is designed to invalidate any test where the test duration is 6 minutes or less, but test durations of 10-15 minutes may be suspect as well. In order for a student to finish a 45-50 item MAP test in that time, the student would almost certainly need to rush through the test. This could lead to a higher SEM and a lower RIT score.
So what can we do?
- Make sure your test proctors are logging start and end times, when possible.
- Proctors should circulate around the room, making notes on student progress. For example, if a student has completed 20 questions after only 5 minutes, that student may not be fully engaged. Remember that proctors have the ability to pause and restart a student’s test if they feel the student is not engaged.
- Retesting is also an option. If a student’s score has dropped significantly (>10 RIT points, for instance), particularly if the test duration also dropped considerably, a re-test might be warranted.
- Make use of the NWEA Reports which show test duration times. The Comprehensive Data File (CDF) shows the test duration times for all MAP tests. Any time you see a test duration under 15 minutes, you should wonder if the student was fully engaged.
- Thoughtfully schedule your MAP test window. If MAP testing takes place too close to state testing, or too close to winter or Spring Break or another holiday, it is possible that the student engagement will suffer.
- Take heed in promising “fun or free time after everyone has finished the test.” MAP is an untimed test, but students may tend to rush through it if they know that there is a treat waiting for them once they are all finished.
- Maintain consistency across testing seasons. If the students receive some special celebratory award or activity after testing in one season, the same policies should be applied in all seasons.
- Call your NWEA Partner Representative for more information and suggestions. We are here to partner with you and to make your life a little easier when we can. Help is always only a phone call or email away.