Poppycock, Folderal, Nonsense

. . . in the immortal words of Todd Farley.

About a week ago, someone sent me a link to an Op-Ed piece in the New York Times by Todd Farley, author of Making the Grades: My Misadventures in the Standardized Testing Industry.

Farley’s experiences aren’t unique. Like Farley, I am a writer who sort of fell into the test publishing industry by accident. Like Farley, I stayed in the industry long after I thought I would have gone on to what I thought would be my real career of writing novels or screenplays or something, anything.

Both of us started our careers in hand-scoring, so hand-scoring is what I will talk about, specifically the hand-scoring of open-ended test questions. Multiple-choice questions are simple to score, because there is only one correct answer. All multiple-choice test questions are machine-scored. The answer sheets or test booklets are scanned, the answer choices verified by machine, and the scores are then computer-generated. Sometimes there are mistakes in the programming that must be corrected, for example, the correct answer to a given question was actually C but was identified somewhere along the line as A. Sometimes there are mistakes with a student’s name or identification number that lead to a mistaken score. Sometimes–and this happened with my daughter’s third-grade California STAR testbook–the testbook or answer sheet has juice spilled all over it, and so a false score may be generated. Where humans are involved, there will be some error somewhere, it is unavoidable, let us simply endeavor to put checks in place to catch the errors and processes to correct them.

The scoring of open-ended questions is a horse of a whole nother color. By its nature, there must be some subjectivity. In support of standardization are an array of tools that include a scoring guide or rubric, sample student responses at each score-point-level, and anchor papers and rangefinders. A rubric lists the characteristics of the response at each score point, a sample response gives an example of what kind of response is expected, an anchor is a student response that embodies the score point level, and rangefinders show what may be expected at the high, middle, and low ends of the spectrum within a score point level.

It sounds like a complicated process, and it is. And it’s not without its ridiculous moments. And I have to say that though I found much about handscoring interesting, the work itself was tedious and the routine unbearable. But it’s not the Orwellian circus of nonsense Farley describes. Or maybe it is at the company where Farley worked; it wasn’t at CTB McGraw-Hill when I worked in hand-scoring there.

I am only about a quarter into the book, so maybe there will be some sort of Aristotelian discovery on Farley’s part. At this point, he sounds like one of the disgruntled hand-scorers, and there were some of those, people who just never got it, never were able to internalize the scoring criteria and constraints, the ones whose scores had to be checked and re-checked so often that eventually they were let go. He says that he failed to qualify as a scorer for a writing test, which does make one wonder whether this type of work simply was not a good fit for him. Not that I can vouch for what happened at Pearson, as I’ve not worked there.

I will also say that–although I do not at all see myself as a flag-waver for the test publishing industry, and that I have my own strong feelings about the mis-use of tests and what seems to me to be an abuse of tests and how they are used and what they symbolize and how the data are manipulated–sitting in the mocking judgment seat is generally easy to do. I have plenty of ridiculous stories of my own. We humans are ridiculous, it’s in our nature, and thank God that we are, it makes the world so much more entertaining.

And this book is just that–entertainment, a joke that is masked as an indictment of the industry. For myself, I’d be a lot more interested in a thoughtful exploration of the subject, one that takes into account the need for measurement in teaching, and the demand for standardization (because that seems to be the only way to ensure any kind of fairness or equity), and how we could possibly balance these kinds of standardized measurements with classroom performance and evaluations from teachers.

CORRECTION: I mean “Folderol.” Geez. And to think I won first place in the 8th grade spelling bee. What did I tell you? Human error.

Email UpdatesWIMJ usually has updates on a bi-weekly schedule. If you like this post, sign up. I'll write more.

Comments

  1. I appreciate your take. I recall reading a similar article in EdWeek, where a disgruntled scorer was given a tell-all platform. I found the piece one-sided, probably distorted, and several notches below the caliber of writing usually found in EdWeek. Perhaps it was the same author.

    I do believe there is value in having testing insiders educate the public as to how tests are created and scored. Certainly, an understanding of these elements serves to inform interpretation of test results and the decisions that follow. However, like all things, this should be done responsibly to avoid doing the public a disservice.

    Jason

Speak Your Mind

*