Better a little which is well done, than a great deal imperfectly.–Plato
In my last post, I may have given the impression that editors have more power than they do, and that perhaps that they have time to consider the ramifications of failing to provide all the necessities, or that they are willfully negligent. Salaried as they may be, editors often find themselves in an unenviable pickle, with compressed development cycles and few resources. The industry’s reliance on freelance personnel increases the workload of front-line staff, who may now have to manage groups of writers in addition to performing other duties. Each writer must add about an hour a week in emails, phone calls, and admin tasks–and that’s if the writer is low-maintenance.
It’s also likely that editors want to provide all the necessities, but those necessities don’t exist and the schedule doesn’t allow time for editors to develop them. (Some of the most experienced item writers are able to work around the deficiencies, but the work of the less experienced will be affected.)
No one–except the one at the top of the pyramid, I imagine– is resting on a velvet cushion.
I may have left another inaccurate impression: that it’s all about the money. It’s not. How can it be? This is not a high rolling game. What I mean to say is that when writers don’t have what they need to do their best work, everyone loses.
The industry continues to become less hospitable to the people actually doing the work of creating the tests–or, more accurately, the people writing the passages and questions from which the tests are assembled–which results in a great deal done imperfectly.
Writers lose time and money; they also lose the best of all rewards, the satisfaction of a job well done, simply because how can you do a task perfectly when the task hasn’t been clearly defined, and when you ask for clarification, you’re directed to figure it out?
The companies lose much, much more. The lower pay and the more pain (inconvenience? Call it what you will. I mean all of those tiny ducks that are pecking us to death) to the writers, the lower quality the work, and the fewer writers willing to undertake that work, those fewer writers being the ones who have no choice: the least proficient, the least experienced. And the most highly skilled writers simply decide they’ve had enough and they move on to greener (or at least different) pastures.
Most importantly, the children who are taking the tests have already lost when they’re faced with low-quality materials that don’t provide them with a fair chance to demonstrate what they know and can do.
All right. Let’s move on. I’m eager to address the basic rules of item writing (a version of which you can see here, in the Quality Control Checklist published by CCSSO), but I realize I should first define some terms.
An item is a test question. An item may be discrete, or may depend on some external stimulus, such as a reading passage or a chart or a map or something else.
Here is a discrete item:
Why does my dog Sophie bark at mail carriers?
A She is flat-out crazy.
B She is outraged by uninvited guests.*
C She knows something about them that we don’t.
D She wants to register a protest about mail delays.
The above is a multiple-choice question, and contains a stem (“Why does my dog bark at mail carriers?”) and four answer choices: one correct response (B, as far as I can tell, but I think maybe C is a possible right answer) and three distractors. Distractors, which used to be known as “foils,” are wrong answers. Don’t get hung up on the language–the point is never to distract nor entice the test-taker to bubble the wrong answer; the point is to create wrong answers that have a reasonable foundation in common mistakes kids would make with that particular skill or bit of content knowledge. More on this later. But tests should never be tricky.
A multiple-choice item is usually worth one score point, and used to be budgeted for one minute of test-taking time, not including the time it takes to read a passage or examine whatever stimuli is needed to answer the question.
There are other item formats: constructed-response items, which are also known as open-ended items. These require the student to provide a response. The response may be as short as a word or a phrase, or, in the case of extended-constructed-response items, the response may be a complete essay.
Here is a short constructed-response item:
Write two words to describe my dog Sophie. Use details to support your answer.
And here is the scoring rubric:
2 points: The response includes two accurate describing words, and is supported by relevant evidence.
1 point: The response includes one accurate describing word, and is supported by relevant evidence, OR the response includes two accurate describing words with no supporting evidence.
0 points: The response is blank, illegible, off-topic, or otherwise impossible to score.
A short constructed-response item would usually have a score point range of 0-2 or 0-3, and would be budgeted for 5-10 minutes. More than that is usually reserved for an ECR, which could take as few as 15 minutes, or as long as an hour or more for a full essay.
An extended-constructed-response item would look like this:
Considering Sophie’s protective nature, do you think it is wise for strangers to approach her? Why or why not? Write an essay in which you discuss the wisdom of approaching a dog with whom you are personally unacquainted.
I don’t provide a writing rubric because they are complex creations, but you may see some examples here and here. The score point ranges for ECR items vary, depending on the traits of writing and number of domains. That is, an essay might be scored for organization, style, and conventions. If the question depends on the student’s comprehension of a passage, the essay might be scored for both reading and writing.
Bear in mind that these sample items are jokes, and as such, aren’t examples of exemplary items, primarily because they require a great deal of prior knowledge, and so the test-taker who is unfamiliar with Sophie and dogs in general will perform less well than the test-taker who is on a first-name basis with Sophie and/or other dogs. There are other, less egregious flaws, but we’ll get to those when we get to them.
If you have an item you’d like me to examine, explain, or deconstruct, feel free to post it in the comments. Check the copyright first.