Imagine this: You’ve been invited to observe two classrooms. In one, students are building and coding a robot. In the other, they’re prepping for the state test with practice packets. Weeks later, you’re handed a spreadsheet. The scores? Nearly identical.
The data story seems clear: the schools are equally effective. The lived reality? Not even close.
A new study from MIT economists Glenn Ellison and Parag Pathak reframes this disconnect. They built a model of curriculum design as a time-allocation problem. Schools are constantly making choices about what to teach, in what depth, and to whom. If a school invests heavily in advanced skills that don’t appear on standardized tests, the official “effect” might look like zero. But that’s because we’re measuring the wrong thing.
Three ideas from the research stuck with me:
- Curriculum is matching, not mass production.
Schools work best when what they teach aligns with the students they serve. In the model, selective schools design a curriculum that matches the readiness and trajectory of their students. A mismatch, whether a curriculum is too basic or too advanced, means lost learning time.
- Cutoff scores can hide the real story.
At the admissions line for selective schools, test score “jumps” might vanish. But the study shows you can still find tell-tale signs: a change in the slope of achievement, differences on harder vs. easier tests, or question-by-question gaps that reveal what’s being emphasized.
- What you measure shapes what you value.
If our primary yardstick is a state exam focused on basic skills, we will overvalue schools that “teach to the test” and undervalue those that stretch students beyond it. The authors found that, for example, Boston Latin School didn’t move the needle on the MCAS, but it did raise SAT English scores, AP participation, and advanced-skill mastery.
What this means for us as educators and leaders:
We don’t have to frame this as a false choice between “teaching to the test” and “teaching for life.” If your students are struggling in reading, data-informed instruction—frequent checks, targeted practice, responsive reteaching—isn’t just helpful, it’s humane. Closing those foundational gaps, some would call these opportunity gaps, is the work that makes all other work possible.
But for students already fluent in the basics, drilling for the next state test is not the ceiling of ambition. Instead, it’s the floor. The Ellison–Pathak model gives us permission, even a mandate, to build curricula that stretch, provoke, and equip them for intellectual terrains the tests will never map. In our age of Artificial Intelligence, one can argue that this is now more important than ever.
And here’s where the irony bites: the better we get at aligning tests to what we value, the more they’ll miss what matters. We will never fully measure curiosity, resilience, or the pleasure of grappling with something beautifully difficult. Which means the highest-scoring school in the district could, in some sense, be the one that least resembles a school at all.
Policy Takeaways
1. Match the curriculum to the student, not the test.
Use data to close foundational gaps for struggling students, but give advanced learners space to tackle skills and content that won’t appear on the state exam. Individualize and differentiate.
2. Measure what matters and admit what you’re not measuring.
If a program is meant to cultivate advanced thinking, creativity, or niche subject mastery, don’t expect state test data to tell the whole story. Track parallel indicators like AP participation, SAT subject scores, or authentic project work.
3. Build multiple pathways to mastery.
For some students, test-aligned curriculum will be the most direct route to growth; for others, enrichment and off-syllabus challenges will do more to accelerate learning. Design systems flexible enough to support both, even if this means sometimes in the same classroom.
School leadership, like all good teaching, is just as much an art as it is a science. It’s not about rejecting tests, nor is it about elevating them to holy writ. Instead, it’s about knowing when they are a compass and when they are a cage. The Ellison–Pathak study is a reminder that our most meaningful outcomes may be the ones that don’t fit neatly into a spreadsheet. And perhaps the final irony is this: the schools most worth measuring will be the ones that keep teaching things that can’t be measured at all.
Jason McKenna is V.P. of Global Educational Strategy for VEX Robotics and author of “What STEM Can Do for Your Classroom: Improving Student Problem Solving, Collaboration, and Engagement, Grade K-6.” His work specializes in curriculum development and global educational strategy, and he continuously engages stakeholders in education – from parents to educators, and policymakers, helping prepare students for a knowledge-based 21st-century economy. For more of his insights, subscribe to his newsletter.