Labeling Kids: Four Examples of Testing Practices that Inadvertently Impact How Students See Themselves

Labeling Kids: Four Examples of Testing Practices that Inadvertently Impact How Students See ThemselvesIt’s not just what we test, it’s how we talk about the results.

Much attention is given to what we test and how we do it, especially with Common Core-aligned assessments on the horizon. Unfortunately, less notice is given to how we talk about the results with students, parents, and teachers. My research, and that of many others, shows that ignoring how students internalize assessment results could be a big problem.

Studies find that assessment scores can take on quite different meanings to students depending on how they are messaged. On one hand, results that aren’t shared thoughtfully can be demoralizing or even stigmatizing. On the other, the right message can be positive and transformative.

Despite research showing that labeling students can be counterproductive, a number of educational policies and practices do just that. These practices often identify students with the intention of better supporting them. For example, accountability practices break down performance by student subgroup to ensure that students with disabilities receive the support they need. While such practices are vital, they also necessarily involve labeling students.

Below, I provide four examples of practices meant to benefit kids (and that generally do, one could argue), along with evidence these same practices may also be negatively influencing how teachers see their students, and how kids see themselves.

Example #1: Accountability systems. Despite abundant research showing that labeling students can be harmful, accountability systems consistently label kids. For example, in Massachusetts, students received different proficiency designations based on the state test, one of which was “failing/warning.” John Papay and his colleagues (2011) showed that receiving this label resulted in lower college aspirations, even when comparing these “warning” students to others who scored only a point or two higher on the same exam.

Example #2: Subgroup identification. In part due to federal accountability requirements, many districts examine academic achievement by subgroup. For example, test results are often reported separately for students with disabilities and English language learners (ELLs). While the intent of this identification is to ensure these students receive the resources and supports they need to be successful—and that is a very important goal—we often ignore that these same designations could be stigmatizing. Initial findings from my research show that ELLs tend to have lower college aspirations compared to former ELLs with virtually the same language and academic proficiency. This finding suggests there is something inherent about the ELL label and the status it carries that impacts how students think of themselves.

Example #3: High-stakes testing situations. Example 3 indirectly combines 1 and 2. Whether or not educators and policymakers explicitly label students based on test scores, the way test performance is talked about—especially in high-stress situations—can influence how students perform, as well as how they perceive themselves. In groundbreaking work, Steele and Aronson (1995) defined the concept of stereotype threat. The authors administered a difficult test to African American and Caucasian students under two conditions. In the first, examinees were told the test was a good indicator of their intellectual ability. In the second, students were told the test was simply a problem solving exercise with no broader implications. African Americans faired much worse under the first condition compared to their Caucasian peers, but performed equally well in the second. According to the authors, a primary cause of differing performance was that African American students were aware of stereotypes about low performance among black students, which made them doubt their own abilities when confronted with a test described as being able to capture that ability.

Example #4: Early warning systems. Early warning systems are increasingly used by districts and states. These systems harness available data to help identify which students are not on track to finish high school, oftentimes well in advance of 12th grade. While the objective of intervening early should be pursued, little attention is given to whether identifying students as “not on track” could generate negative reactions among teachers and students. In my own research (Soland, 2013), there is some evidence that teachers incorporate certain biases into their opinions of which students are on track, and that receiving a data-generated label could reinforce those biases for the worse.

Presenting these examples in tandem is not meant as an indictment of current educational practice, not even of the specific policies I cite, necessarily. The examples are meant to make a simple point: the specific labels we give to kids, and the way we talk about these labels, may be as important as the actual assessment results underlying them.

Fortunately, emerging educational studies, especially in social psychology, present options for combating the effect of a label. These studies are vital given labels will always exist in some form. After all, we can’t afford to ignore the performance of marginalized groups like ELLs, which in turn requires that we call that set of students something.

As an example of such research, Carol Dweck’s studies on growth mindset show how helping kids see assessments as an opportunity to learn and grow, rather than as a measure of some innate and fixed ability, can drastically improve performance. In coming weeks, I will write a follow-up blog describing areas of research educators can draw on when thinking about how to communicate about test results with kids.

Dweck, C. (2006). Mindset: The new psychology of success. Random House.

Dweck, C. S. (1986). Motivational processes affecting learning. American psychologist41(10), 1040.

Papay, J. P., Murnane, R. J., & Willett, J. B. (2011). How performance information affects human-capital investment decisions: the impact of test-score labels on educational outcomes (No. w17120). National Bureau of Economic Research.

Soland, J. (2013). Predicting high school graduation and college enrollment: Comparing early warning indicator data and teacher intuition. Journal of Education for Students Placed at Risk (JESPAR)18(3-4), 233-262.

Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of personality and social psychology69(5), 797.


4 ways to personalize learning

With learning interrupted by COVID-19, personalized instruction is more important than ever. Help your students build a path to success.

Get the guide


Literacy for all: How to build confident, lifelong readers

Did you know strong readers are more likely to graduate from high school? Learn how to foster a love of reading that lasts well beyond this school year.

Start reading


How can assessment data help you?

Interim assessment data can help teachers keep the bar high for all students. And it can help administrators make critical decisions at the school or district level.

Learn more