A new study from Stanford University is beginning to make its way through education circles, and it deserves more than a headline reaction (also in the Hechinger Report). Researchers fed 600 middle school essays into four different AI models and asked those models to provide writing feedback. Then they resubmitted the same essays with different student identifiers (Black, white, Hispanic, Asian, male, female, motivated, unmotivated, learning disabled) and watched what happened. Unsurprisingly, the feedback shifted. Essays attributed to Black students received more praise and encouragement, often emphasizing personal narrative and leadership. Essays attributed to Hispanic students or English learners triggered more grammar corrections and language policing. When the student was identified as white, the feedback focused on argument structure, evidence, and clarity, the kind of substantive critique that actually develops writers. The researchers named what they found: positive feedback bias and feedback withholding bias. More praise. Less criticism. Different expectations. Same essay. This is a serious finding, especially as AI tools are being deployed in classrooms at scale, often without adequate preparation or critical examination. But I want to go somewhere the headline doesn’t.
The bias is the symptom. The pedagogy is the problem
Before Bias, There Was Design
Here is the part of the study that should give educators pause. When researchers ran the essays through the AI models without providing any demographic information (no race, no gender, no motivation level), the default feedback was already problematic. The lead researcher described it as discouraging and overly focused on corrections. Read that again, please.
Before race entered the picture.
Before gender.
Before any identifying information at all.
The baseline pedagogical stance embedded in these systems was already harmful to students. That is not simply a bias issue. That is a design issue. Of consideration here, it is one we created the moment we began outsourcing pedagogical judgment to systems that were never designed to make pedagogical decisions. Yet keep in mind how many educators and students rely on these systems they are inadequately prepared for, but continue to rely on under the auspices of saving time or just getting the work done.
AI Literacy Has Been Misdefined
A considerable amount of energy has been spent in education on building what we call AI literacy, far too often undefined or defined too narrowly in scope, in an attempt to help educators and students understand how to use these tools. This is why literacy has been defined too often at the functional level:
- How to write a prompt
- How to refine an output
- How to use the tool efficiently
This study is a direct challenge to that definition, and one of many I, along with a few others, have been raising for a few years now. If an educator deploys an AI feedback tool without understanding:
- What pedagogical assumptions are embedded within it
- What it rewards
- What it ignores
- Whose writing is recognized as “strong”
- Whose it treated as deficient
…then my contention is that the educator is not AI literate in any meaningful sense. At best, they may be AI fluent. However, there is a difference. Functional fluency tells you how to use a tool, for the most part. Critical Digital Literacy, which includes but is not limited to AI, asks what the tool is doing, to whom, and in whose interest. This distinction, between functional fluency and critical literacy, is one that my co-author Dee Lanier and I have been examining closely in our work at the intersection of AI, equity, and education. It is a distinction we are returning to with even greater urgency as we develop the next evolution of that work. Studies like this one are exactly why.
The Personalization Paradox
One of the most compelling promises of AI in education is personalization, the idea that tools can adapt to individual learners in ways a single teacher managing thirty students cannot. Or the teacher has a signifianct number of challenges doing starting off with time, support, and resources. This study reveals the shadow side of that promise. Personalization, without critical oversight, does not necessarily raise expectations. It can lower them. It can mean that the students who most need rigorous, substantive feedback, the feedback that builds capacity and develops voice, are precisely the students most likely to receive encouragement instead. Encouragement is not nothing. But encouragement is also not instruction. Here is the truth that too many don’t want to consider: when the gap between what we say to students and what we expect of them maps onto identity, we are not personalizing learning. We are encoding inequity into the feedback loop.
Humans in Control Is Not Enough
The lead researcher offered a clear takeaway: we shouldn’t leave the pedagogy to the large language model. I wholeheartedly agree. But “humans in control” is not a sufficient answer. Control without understanding is not control. Reviewing AI output before sending it to students is a starting point, not a solution. What is required is much deeper and requires much more analytical thought:
- Educators who understand what these systems are doing
- Leaders who ask better questions before adoption
- Systems that prioritize design over convenience
- Systems that prioritize efficacy or speed
Questions to ask:
- What pedagogical framework is embedded in this tool? (I have a whole other post I could write on this one, given a recent experience talking to a few vendors)
- Whose writing does it recognize as strong?
- What happens to students whose voice falls outside its training data?
- Who evaluated this tool before it reached students?
These are not technical questions. They are leadership questions. They are access and opportunity questions. Unfortunately, right now, in too many spaces, they are not being asked
What Is Actually Required
If this study tells us anything, it’s this: AI literacy cannot, and should not, stop at the level of use. It must extend to the level of understanding.
Understanding:
- How these systems are built
- What they are trained on
- What patterns do they reproduce
- Whose interests they ultimately serve
That is a higher bar than most current professional learning is setting. It requires moving beyond:
“Here are five easy ways to use AI in your classroom.”
…to questions of:
- Power
- Design
- Consequence
- Intended or desired impact
It requires educators who are not just users of AI but critical participants in its implementation. Leaders who understand that adoption is not implementation, and that the gap between those two is where harm lives, along with expectations versus actual results.
The students in this study wrote the same essays.
They deserved the same quality of feedback.
They did not receive it.
We have spent years asking what AI can do for education.
It is past time we started asking what it is doing to it.