Student Effort and Test Score Accuracy

The one question I get asked most frequently from partners has to do with the accuracy of test scores, specifically in the context of whether a student has given appropriate effort on the test. This has always been of interest to users of our data, as they want to know if the RIT score a student receives accurately captures the student’s actual test performance. However, as an increasing number of schools are now using NWEA assessments for accountability purposes, the focus on the accuracy of test scores—and whether or not students are giving appropriate effort to ensure accurate test scores—is at the forefront of the minds of a lot of educators.

Student effort on a test is a difficult thing to measure. When it comes to effort, we don’t really have a metric at this point that says, “Yes, a student did give appropriate effort during this test,” or, “No, that student wasn’t paying attention to the test items at all!” (though this is something of particularly high interest to the research team). But we do have three metrics that are easily accessible by an educator that can at least provide some indication of whether a student was appropriately engaged on the test. Those three metrics are: Time on test, percent of items answered correctly, and the standard error of measurement of the test event.

Let’s use this scenario of the test events of two 5th grade students to illustrate this point: Student A completed his test in 20 minutes, got 25% of the items correct, and had a standard error of measurement of 4.0 for his test event. Student B completed his test in 40 minutes, got 52% of the items correct, and had a standard error of measurement of 3.0. Based on this information, which student was more likely to have given appropriate effort, and as a result, which test score is a more accurate reflection of the student’s actual test performance? The answer to this particular scenario might be obvious to most—Student A doesn’t appear to have given his best effort on the test –  those three pieces of information provided might motivate me to explore this student’s test performance a little further.

On the math and reading MAP tests, most students take approximately 40-50 minutes to complete the test. This does not mean that if a student takes 20 minutes to test, that he or she did not give appropriate effort. Some students just get done quicker than others, and they are able to complete the test accurately in a much shorter period of time. However, if a student does test much quicker than what would be expected, then it may be worth exploring other pieces of data—such as the percent of items that the students answered correctly—to see if that piece of information provides some indicator of the effort given by the student.

Because of the adaptive nature of the test, the majority of students who test are going to get approximately 50% of the items correct. This percentage might range from 43% to 57% or so, but in general, 50% is what we would expect to see. So if a student tests in 20 minutes and only gets 25% of the items correct, then this might be indicative of a student not giving appropriate effort—this might be a student who didn’t read the items at all, or just guessed at most of the items.

The final piece of data an educator should look at when gauging student effort would be the standard error of measurement of the test event. If you don’t know what standard error means, that is okay; briefly, it just gives us information about the precision or accuracy of the test event. Small standard errors are great, as they indicate much more accuracy/precision; larger standard errors…not so great.  On the math and reading MAP tests, we typically observe standard errors that range from approximately 2.9 to 3.3 (plus or minus 0.1 or so). If a student has a high standard error, let’s say 4.0, that doesn’t necessarily tell us anything about the effort that a student gave on the test. But, if the student only took 20 minutes to test and only got 25% of the items correct and had a standard error of 4.0, then the triangulation of these data likely suggest that the student was not giving his best effort. Or, at the very least, the combination of these data could prompt a deeper investigation into this student’s test performance, to see if the data match what the teacher or proctor observed during the testing session.

It’s important to note that any one of these metrics (or even the combination of metrics) don’t provide definitive evidence of the effort given by a student. However, these data can be useful to identify times when it appears like a student did not give full effort on his or her test, and could be very useful to help in the interpretation of student test data.