Research-based Accuracy

Scientifically-based NWEA MAP tests provide accurate, reliable, and valid information about student growth.

When making decisions, one must be confident that the data are accurate, reliable, and valid. NWEA conducts ongoing research to ensure that you can rely on the results you receive from our assessments.

Our research team ensures the results you receive are highly accurate by creating and maintaining accuracy in both a stable scale and test design.

Stable Scale

NWEA assessments use a measurement scale that has proven to be exceptionally stable and valid over time. Our scale is based on the same modern test theory that informs the SAT, Graduate Record Exam, and Law School Admission Test. The benefit of this test theory is that it aligns student achievement levels with item difficulties on the same scale. The scale we use is divided into equal parts, like centimeters on a ruler. We call these parts RITs, which is short for Rasch Unit (after the test theory's founder, Danish statistician Georg Rasch).

An Equal-Interval Scale

You can liken the RIT scale to a meterstick which is comprised of equal units of measurement, centimeters. Metersticks are sometimes used to measure a student's physical growth over time. They are reliable and accurate indicators of growth over time because the units of measurement do not change. As a result, you can confidently compare a child's growth from one year to the next. Furthermore, since the units of the meterstick are of equal value, you can reliably make comparisons and draw conclusions about the growth of a child or a group of children.

Like using a ruler to measure a child's growth in height, we use the RIT scale to measure a student's academic growth over time.

We place all of our test items on the RIT scale according to their difficulty. Each increasing RIT is assigned a numeric value, or RIT score, that indicates a higher level of difficulty. As a student takes a MAP test, he or she is presented with items of varying RITs, or levels of difficulty. Once the MAP system determines the difficulty level at which the student is able to perform and the system collects enough data to report a student's abilities, the test ends and the student is assigned an overall RIT score. In survey with goals tests, the student also receives RIT range scores for the goal strand components.

This RIT score is used by teachers to plan instruction around their students' strengths and weaknesses relative to their state curriculum standards. Well-targeted instruction leads to improved performance and growth.

The characteristics of the RIT scale provide several benefits to educators:

Grade-independent
Because the tests are adaptive and the test items displayed are based on student performance, not age or grade, identical scores across grades mean the same thing. For example, a third grader who received a score of 210 and a fourth grader who received a score of 210 are learning at the same level. This allows growth to be measured independent of grade.
Equal-interval
The RIT scale is infinite, but most student scores fall between the values of 140 and 300. Like meters or pounds, the scale is equal-interval, meaning that the distance between 170 and 182 is the same as the distance between 240 and 252. This allows educators to apply simple mathematical equations to the scores to determine information such as the mean and median scores in a class or grade.
Stability
More than twenty years after it was first implemented, scores along the RIT scale mean the same thing. As a result, educators can confidently measure growth over many years.

Test Design Process

Item Banks

The quality of the test items also contributes to the reliability and validity of a test. Measures of Academic Progress (MAP) draw from our bank of more than 15,000 items to create tests for Mathematics, Reading, Language Usage, and Science.

Each year we add hundreds of new items to the item bank. Most of these items are developed by teachers who receive thorough training in our item-writing processes.

Each potential item must pass a rigorous bias and content review, which is followed by field-testing with a minimum of 300 students. Only those items that pass the bias review, field-testing, and the subsequent strict statistical screening procedures are calibrated for difficulty and assigned the appropriate value on the RIT scale. These items become part of the continually expanding item bank.

Ongoing Evaluation

In 2004, NWEA published a critical evaluation of our assessments, the Reliability and Validity Estimates. In this document, we asked:

Is the test reliable? This primary question can be defined as the consistency of the measures obtained from the test. For example, if we were to use a ruler to find the dimensions of a piece of paper, we would expect to find the same dimensions each time we measured the paper with the ruler.

Is the test valid? When we ask this question, we want to know to what degree the test actually measures what it purports to measure. For example, does the test actually measure what is expected to be taught in the classroom?

Read how we measure the Reliability and Validity of NWEA assessments and discover the results of these investigations.

Publications

NWEA regularly pursues research to validate and improve the quality of our assessment instruments. You can directly access the most critical information regarding the reliability and validity of our assessments through the following links.

Date Title
4/1/2005 Comparison of MAP and ALT scores

4/1/2004 ISAT DIF Study 2003

3/1/2004 Reliability and Validity Estimates

4/1/1999 A comparison of test scores from the Iowa Test of Basic Skills and the NWEA/Meridian Checkpoint Assessment Level Tests

Learn how:

NWEA assessments are useful instructional measures. Research-based MAP tests produce results that can be used with confidence. MAP test results measure student growth.

Site Map

Privacy Policy

© Copyright NWEA 2004-2008