Thanks to high-quality assessments, teachers, students and families are able to see and celebrate heroic growth. Consider a fifth-grade student who starts the year reading at a second-grade level. In June, his teacher rightfully celebrates his achievements because he is now reading like a fourth grader—still not proficient but growing remarkably. We believe that is cause for celebration, and without properly assessing growth, that bright window might just remain shuttered.
Measuring growth accurately and equitably isn’t easy, but nobody should be willing to accept a measure that is just “good enough.” In order to create a growth assessment that works, we recommend following these seven steps:
1. Align test questions to content standards
Standards lay out a clear, consistent understanding of what students are expected to learn, and the sequence in which they are expected to learn it. High-quality curriculum plans are based around them.
The standards define concepts of what we want students to master, whether in reading, English language arts, mathematics, science, or other areas—all the way from beginning foundations to advanced expressions.
The first requirement to measuring student achievement and growth fairly and accurately is that the questions that make up a particular test should reflect the content of the standards.
2. Use a vertical scale of measurement
After questions are aligned to standards, you need a scale that identifies the difficulty of the items. Here, an analogy may be useful.
Before 1929, nearly every company manufacturing machined equipment also manufactured and marked the tools to service their equipment—and they all used different scales. This meant that a wrench marked ½” from P&C Manufacturing in Oregon might not fit a bolt marked ½” manufactured by the Armstrong Brothers in Chicago. Matching them up and comparing them all is a fun hobby for collectors but would have been a real impediment to anyone trying to get some work done.
Unsurprisingly, researchers have found that using a combination of different assessment scales is an unstable method for measuring academic growth over time. The solution is to use a single “vertical scale” that spans grade levels. With one long measuring stick in use, no translation is required, so growth measurements remain accurate and reliable.
3. Match question difficulty level to student ability
After creating a vertical scale, you must match item difficulty to student ability. Assessments that restrict questions to grade level standards alone have an important role in providing information that school systems and states need. Summative assessments given for state accountability purposes are explicitly built for this purpose.
However, when assessments are restricted in this way, we are not able to precisely identify where students who are performing above or below grade-level actually achieve—and this represents many, if not most, of our students. How can we achieve real equity in the classroom if educators cannot chart a path forward for low-performing students or continue to encourage growth in high performers?
4. Use a deep pool of questions to increase validity
The more questions an assessment presents to the student, the greater precision we can expect. When there are many items falling at, above, and below the student’s level, educators gain an increasingly detailed view of the student’s achievement. This granularity is what elevates a proper assessment above a simple “pass or fail” event.
This requires not only many questions at each difficulty level along the scale, but also that appropriate questions be presented to each student. Computer adaptive testing (CAT) makes this process manageable and scalable, meaning that it can be repeated with reliable results with larger or smaller groups of students.
5. Ensure fairness through empirical bias and sensitivity reviews
You now have a deep pool of items aligned to content standards, arranged along a vertical scale, and spanning the full range of students you want to measure. However, students come to school from myriad backgrounds—cultural, socio-economic, ethnic, religious, etc.
In addition, all students may not have had the opportunity to learn the material to be tested, or the material may be presented in such a way as to privilege a certain background. These factors all contribute to the potential for bias in an assessment.
Using practices such as Differential Item Functioning (DIF) and bias and sensitivity reviews to reduce bias in the instruments can help. The American Educational Research Association (AERA), American Psychological Association (APA), and the National Council on Measurement in Education (NCME) publish standards on this for test makers to support fairness and to provide consistency in approach across developers.
6. Define the purpose of the assessment to determine the accuracy required
Test fatigue and demands on classroom time are widely touted factors in “opt out” discussions. That is why clearly defining the purpose of the assessment and the role of the assessment data educators gather is crucially important. Simply put, a more robust assessment composed of more questions will give a more precise idea of student achievement, but this requires more questions over a wider range of grade levels. How precise do educators need to be, and when?
Balancing the need for data against the time required to do the assessment can be tricky. The ability to determine how precise a measure is needed and tailoring the assessment to provide that while minimizing demands on valuable classroom time is one of the key benefits of some computer adaptive tests.
7. Providing context for growth
Once achievement and growth are accurately measured, a world of instructional opportunity opens—as long as there is accompanying information that provides a context. That’s where the standards come into play and the assessment tool becomes the basis for contextualized comparisons. Two important comparisons we can draw from a vertically scaled score and normative data are “growth compared to peers” and “growth trajectories.”
A teacher certainly benefits from knowing what the student’s score is in relation to all the other students in the classroom. A principal benefits from knowing their school’s position within a district, and a district supervisor finds it useful to place a school’s performance in the state and national context. This need is met by establishing growth compared to peers.
By providing each student with the right instruction at the right time during the school year, growth data can help teachers instill a kind of personal navigation system that transforms all students into lifelong learners. Thoughtful use of accurate and fair assessment data leads directly to the equity and growth that are the future of education in America.