Teacher Evaluations: We’ve Got to Come Up With a Better System
"Cake or death?"
"Eh, cake please."
"Very well! Give him cake!"
"Oh, thanks very much. It's very nice."
Comedian Eddie Izzard uses this bit to explain that because "the Anglican faith had a lack of principles for a long time…you can't have extreme points of view." Kind of like the Los Angeles Unified School District (LAUSD) and teacher evaluations.
After years of drive-by evaluations for many of us, we're now looking down the barrel of hard-core test scores and a computerized 63-point framework for monitoring the only folks in the entire spectrum that hold still long enough to be held accountable.
Two years ago we all jumped up and down against the emergence of Academic Growth Over Time (AGT), which is a prediction of student performance on the California Standards Test (CST).
AGT is manipulated to account for a student's socio-economic status (including homelessness), language proficiency, prior test scores, gender, race and attendance. In February, LAUSD Superintendent John Deasy told the Los Angeles Times that 30 percent of teachers' evaluations would be based on raw test scores. Not AGT, as we were told two years ago, when Deasy started to get serious about evaluation.
Compared to raw test scores, AGT looks pretty good.
In 2011, I joined a group of "pioneers," sponsored by the Partnership for Los Angeles Schools, who began field-testing the district's new evaluation system. I went through two "cycles" which included analyzing my own AGT. It turned out to be based on the test scores of 12 students. Why didn't my numbers include all 30 I taught during two double blocks of remedial language arts?
Don't you need at least 30 for a statistically valid sample? Which 12 were we talking about? Why couldn't we use my classroom assessments (essays, running records of reading ability, grammar quizzes, etc.) instead?
Here's a look at how I grade and analyze my students' essays.
The capacity of a screen shot is limited, but you can tell that vertically, one student is green (83 percent mastery), five are yellowish (70 percent range of mastery), and the rest are orange (below 60 percent mastery). Of this last group, many have empty boxes signifying missing work that is still factored into their average.
If you could see the numbers in detail, you'd be able to tell that some of these "problem" kids are actually doing all right. Wendy, for example, earned an 85 percent for her single essay. Spread out over three assignments, that equals 28 percent. Is she failing my language arts class? Yes. Can she pass the California High School exit Exam? Very likely. A good number of my students fit this profile.
The spreadsheet part disaggregates students' scores for two writing assignments. In December, we were in the 50 percent range of mastery across the board. In January, we're in the 70 percent range on four out of five criteria. Improvement, right? That's what I thought! But we can't use it to measure how effective I am as an English teacher. We have to use raw test scores.
Here is a prediction of how the same students are likely to score on the CST:
This "forecast" is from a software program we use about three times a month in the classroom. Twelve of the 18 students shown are predicted to score far below basic or below basic on the CST. At my school, CST scores do not count for grades, promotion or graduation, although 80 percent of my students say they would take they test seriously if it did.
Wait, what? Students don't take the one measure of my efficacy seriously? Is the announcement that we're now using raw test scores supposed to make me ask for AGT instead? When did death become cake?