I was still in college the first time someone cried in a parent-teacher conference with me. I had found a summer job at a free enrichment program for public school students. One of our students had just taken her first-ever standardized test, a practice version of the entrance examination for an elite magnet high school. She had scored in something like the fourteenth percentile.

“I don’t understand,” her mother told me. “She does all her work in school. She does her homework. She does extra. I stay on top of her grades from the beginning. Always, she is getting As. Always, I think she is doing well.”

Even then, at the beginning of my teaching career, I could see how this had happened. A quiet, diligent, well-behaved girl who turned in all her assignments—of course her grades were great. But she couldn’t read grade-level texts. Neither could many of her classmates at their majority-minority, wrong-side-of-the-tracks public school.

Our summer program offered open enrollment and free enrichment; it tended to attract motivated students with motivated parents. The kids largely earned decent grades. Still, we took for granted that most would need remediation, extra support in basic skills they should have mastered long before middle school. Our strongest students would have qualified as just barely at grade level relative to national norms. What we called striving for excellence was really a pitched battle to break even.

Without standardized testing—and lacking any other basis for comparison in their own educational experience—the students’ families had no way of knowing what I had assumed was obvious: that eighth graders on the other side of town were well past working on multisyllabic words or improper fractions. They had no way of knowing that their hard-working, solid-GPA kids were already far behind.

Six months later, when President George W. Bush proposed the No Child Left Behind Act, which made standardized testing mandatory beginning in the third grade, I imagined this mom as the beneficiary. Someone should have told her years ago that her daughter wasn’t reading well. She should have known what her daughter’s teachers understood: that her daughter did well relative to her classmates but lagged behind the more privileged kids with whom she’d never go to school. She should have known that the gap would only widen over the years—that school wasn’t fixing the problem but allowing it to fester.

I don’t know what she would have done about it. But she had a right to know.

And then there is the challenge of obtaining the theoretical benefits in spite of practical obstacles. The problem is that measuring how well a younger student reads is close to impossible, at least through any standardized approach. If you really want to know, you’ve got to sit down and listen to the child read and then ask him or her to explain the story. The process takes about twenty minutes per child, plus the sensitivity that comes of experience. Some children read fluently but make no coherent meaning from the words. Some stumble over pronunciation but can understand and analyze in depth. Some know more than they can readily express. And sometimes the process makes a child too nervous to think clearly, and we have to try again on another day or in a less evaluative setting.

That in-person, in-depth assessment yields a range of information about a child’s reading. By contrast, the tool we have been using in Washington, D.C.—the Comprehensive Assessment System, or DC CAS—feels more like checking if a window is open by throwing a stone at it, as the Shel Silverstein poem suggests.

Beginning in the third grade, students sit for four straight days of testing, with a math and a reading section on each day. Each reading section includes three or four unrelated passages, with multiple-choice questions after each one, plus one or two essay responses. The test is untimed, which means that conscientious students can and do work for many hours each day; it is a little like taking the SAT of my own adolescence for four days in a row. Test security is paramount. Kids who finish early can’t leave; they can’t read a book, either, lest they accidentally learn something that is on the test. Preparing kids to survive the testing process is a completely separate endeavor from teaching them to read or write or think.

We give our first one-day practice test in the first quarter of the school year. Even for strong readers, it is overwhelming: the intensity of focus, the pressure, the silence, the loneliness. You can’t ask your friend about a word you don’t know; you can’t even tell your teacher if you find a story you like.

The students’ families had no way of knowing that their hardworking kids were far behind.

Some kids give up and put their heads down. Some spend an hour on the very first passage, then flip the page, realize they are not done, and bubble in the rest of the answers at random. Every year, kids cry. Watching that first round of testing, you might wonder how anybody passes.

But we are a public school and we rely on public funding, so there is no way around it. Pass they must. We practice strategies for staying focused, for encouraging yourself, for being brave even when you have to figure out what to do all by yourself. We work on managing boredom. We praise and celebrate the virtue of stamina. We take more practice tests.

By the time the real test rolls around, we have been talking about it for so long it feels a little like the Super Bowl. We are fierce. We are more than ready. Kids show up to school an hour early. They bring their lucky pencils, we give them magic mints for extra strength, and they write each other little notes, “You can do it!!! Use your strategies!!!”

The children take this test at a zenith of dedication and hopefulness I have not seen equaled in any other context. Toward the end of the week, some begin to tire; but there is also a growing sense of pride among them, in what they have accomplished, and it carries them through. “I could never have done this before,” one girl told our homeroom during the break between sections. “I never thought I could do something this hard. At my old school, I just bubbled in anything. But I really learned how to do things. I’m smarter than I used to be.”

As a teacher, I’ll take that kind of meaning over meaninglessness. Given that the test is mandatory, I’d rather it feel like a grand mountain we climbed together. But there is something heartbreakingly unfair about asking our kids to invest the best of their academic passion in a poorly designed standardized test rather than in a science fair project or a school play or an actual mountain to climb.

And, among the parents I hoped would benefit from the feedback provided by testing, the message has been hopelessly muddled. Our scores are solid—but they measure stamina, strategy, school culture, and a generous curve more than actual reading proficiency. Plenty of kids who pass are still unprepared to compete on the SAT or ACT a couple of years down the road.

Lately, when we talk about testing, we whisper with apocalyptic trepidation about the coming shift to the Common Core and new national assessments that align to it. These exams are less repetitive and grueling than the DC CAS, but so much harder. They require even young students to synthesize multiple sources, write analytical essays, perform a “research simulation,” and solve multi-part problems that feel more like logic puzzles.

It is less practical to “prep” kids for this kind of test. They have to actually be prepared—to be confident reading and writing at or above grade level—before they can begin to tackle the task itself. Compared with state tests such as the DC CAS, early versions of these Common Core–aligned tests have often revealed bigger gaps in achievement between disadvantaged kids and their peers. But the measurement is not the problem.

Testing doesn’t produce the staggering gaps in performance between privileged and unprivileged students; historical, generational, systemic inequality does. Testing only seeks to tell the truth about those gaps, and the truth is that the complex tasks of the Common Core are a better representation of what our students need to and ought to be able to do. I’m all for measuring that as accurately as we can. In recent years our schools have in fact made huge gains in helping our students tackle real complexity. I’d love to take genuine pride in our scores, knowing they reflect those strides toward rigor.

If we could give these harder tests internally and get back detailed results—share them only with parents, and use them only to improve our own planning—many more teachers would embrace them. Liberated from the testing tricks and stamina lessons, we would embrace more honest feedback about where our students are and how they still need to grow.

The trouble is that we know the scores can and will be used against us and our students. Those who interpret the results in public don’t focus on the needs of the individual. Nor do they seek to identify and propagate the most effective instructional practices. Instead they use the scores to judge who is capable and incapable; to bar access to opportunity; to dismiss and diminish our successes; to justify rather than fight against educational inequality.

In this atmosphere of fear, it is difficult to look forward to more-rigorous tests and the detailed results they produce. Our instinct is to shield our students—and ourselves. Instead of dropping test prep from the schedule, we are tempted to push it to the point of absurdity, in case those old tricks might serve us better than the truth.

The first project for policymakers, then, is to restore our trust in measurement as a tool for making schools better—not for tearing them down. Give the challenging tests, without watering down the content or curving the results, but don’t use scores to pass and fail. Instead, focus on identifying the interventions that really work for students from similar backgrounds and with similar needs: the tests should be used for research, not judgment.

The next step is to disrupt the culture of test anxiety, test preparation, test rewards, and the suddenly ubiquitous pre-exam pep rally. One proposal: stop testing all the students all at once, at the end of the year, in a culminating district-wide trial-by-fire. Instead, treat academic testing like the rotating hearing test or scoliosis checkup. Sample two or three students at random and without preparation, every week throughout the year. Sit them at a computer. Let them click through the test with little fuss. Measure what they can do on that day, share the data with teachers and parents, and then send them right back to class.

Managing only a few kids at a time would simplify testing logistics for schools. The test material is computer- and cloud-based, adaptive, and easy to update, so test security is less burdensome. Students can’t share answers when they don’t face the same questions.

Most important, by testing kids individually, we would reframe testing as a source of information rather than evaluation. We’d reduce the incentive to cheat or prep and instead put the emphasis back where it belongs—on what students need and on how can we help them truly learn.