Balancing Evaluation

What do we know about the efficacy of teacher evaluation systems across the United States?   Here is the lead statement from the 2010 report from the New Teacher ProjectWidget Effect: Our National Failure to Acknowledge and Act on Differences in Teacher Effectiveness.

A teacher’s effectiveness—the most important factor for schools in improving student achievement—is not measured, recorded, or used to inform decision-making in any meaningful way.

The Widget Effect represents the failure of teacher evaluation systems to provide precise and reliable data regarding a teacher’s classroom performance.  As a result teacher performance within schools and across districts is assumed to be relatively the same.  The Widget Effect, treating teachers like widgets, is an injustice to the fact that some teachers are excellent and some are not.  Have we failed to produce a supervision and evaluation model that distinguishes between the two?  Many policy makers and educators think we have.

The New Teacher project studied schools from four states and 12 districts some of which used a system that differentiated teachers as either satisfactory or unsatisfactory and found that nearly 99% of 15,000 teachers evaluated received a “satisfactory” rating.  Is it possible that 14,850 of those teachers satisfactorily meet the expectations of their schools or districts and only 150 don’t?  The conclusion drawn from this study is that schools and districts have failed to design a feedback system that can distinguish between exceptional, very good, satisfactory, or unsatisfactory teachers.  Of course the other possibility is that schools and districts do not set standards that encourage and support teachers reaching benchmarks higher than satisfactory and do not provide effective professional development to help teachers fulfill their potential.

What else do we know about teaching?  In a study by Sanders and Horn in 1996 out of the University of Tennessee, it was found that teacher effects on student achievement are both additive and cumulative, but that having a good teacher does not necessarily offset the impact of a bad teacher.  From the same research group, Sanders & Rivers, 1996, they reported that:

Students who have highly effective teachers three years in a row score as much as 50 percentile points higher on achievement tests than those who have infective teachers for three years in a row.

Effective teachers have positive effects on students’ achievement and ineffective teachers do not.  However, we lack comprehensive and reliable systems for identifying effective teachers who excel at their craft and can effectively teach a diverse group of learners.

What else do we know about teaching?  It is one of the most complex professions because it relies so heavily on interpersonal relationships, effective communication, and deep understanding of material to be taught and the students we teach.  Jon Saphier, in his book The Skillful Teacher, refers to teaching as

One of the most complex human endeavors.

As a result, if we think that we can devise a faculty evaluation system that looks at a teacher through the lenses of his or her principal and student high-stakes test scores that happen once each year, we are sorely mistaken.  The evaluation system needs to honor and attend to the complexities of teaching a diverse and large group of students, each of whom has unique needs.

What else do we know about teaching?  Teachers generally crave good feedback that helps them grow into better teachers.  These feedback systems need to be able to differentiate what effective and non-effective teaching is and what it looks like.  They need to be grounded on principles of good practice that are well articulated by the school or district and are associated with well-designed rubrics that help teachers grown.  Similar to the expectations we have for supporting student learning, clear learning targets that students understand, teachers need clear teaching targets that all teachers understand and can hit.  So in the end we have to be able to recognize, support, and reward excellent teaching, beyond the “teacher-of-the-year” award.

In Georgia, as well as other states that accepted Race-to-the-Top funding, the Department of Education in collaboration with school districts must design and implement a robust teacher evaluation system that includes rating teachers based on how well their students perform on state-mandated, high-stakes tests.  Georgia has been under pressure to develop this system and was recently put into the “high-risk” status as reported in an Education Week article on February 1 by Michele McNeil, Race to the Top Winners Make Progress, Face Challenges, Department of Ed Reports.  She wrote:

Part of Georgia’s $400 million Race to the Top grant is on “high-risk status”—an official designation that can lead to losing grant funding—for weaknesses in implementing its teacher-evaluation system.

In an open letter to Governor Deal, professors from colleges and universities in Georgia outlined thoughtful reasons why Georgia’s teacher evaluation system, Teacher Keys, was an ineffective and misguided model.  They wrote

The state’s new evaluation system, Teacher Keys and Leader Keys, centers on “value-added” measures of student growth. We believe the use of value-added measures in teacher and leader evaluation will likely lead to negative educational, social, and emotional outcomes for Georgia’s children. We believe it is our ethical, moral, and professional obligation to raise awareness about how the proposed evaluation changes not only lack a sound research basis but also, in some instances, have already proven to be detrimental.

What do we know about what teachers want?  In her Get Schooled column, Maureen Downey wrote a column, Georgia teachers: Prefer to be judged on student work rather than on student test scores or surveys. The Professional Association of Georgia Educators conducted the PAGE survey with nearly 2,000 teachers about Georgia’s new teacher evaluation system.  What did we learn from the survey?  Here is a glimpse into some findings:

  • 76% of teachers surveyed were not in favor of using students’ high-stakes test scores for evaluation
  • 77% were in favor of using a portfolio of students’ work in the classroom
  • 59% thought that using a teacher’s portfolio of work was a good idea
  • 77% were not in favor of using student feedback
  • 49% of teachers believed the new evaluation system, Teacher Keys, is effective
  • 73% favor the older system of evaluation

Are we listening to our teachers?  Do we value their input in designing a process that involves their profession?  Some may think teachers are incapable of helping to design a system meant to evaluate their work.  I disagree.  I think teachers want to be involved and are capable of designing an effective way to evaluate their own practice.

Look at New York State has been able to accomplish with TED, Teacher Evaluation and Development System.  In a summary article written in The Learning System, published by Learning ForwardValerie Von Frank outlined what teachers in six districts in New York were able to design as their evaluation system, started before the state was awarded Race To The Top funds.  Their research-based tool

was field-tested and sets out specific, measurable, observable behaviors that demonstrate effective teaching practices.

TED integrates meaningful, targeted professional learning, teacher goal setting, and career development into the model.  It is growth-oriented rather than punitive.  Based on a 100-point scale, TED includes pre- and port-observation conversations and multiple ways to collect information.  They include:

  • 60% from multiple measures of effectiveness from the first phases of the process.
  • 20% from student growth on state assessments or a comparable measure of student growth (increased to 25% if a value-added growth model is used).
  • 20% from other locally selected measures of student growth or achievement (decreased to 15% when a value-added growth model is used).

When feedback is effective and it is aligned to what teachers believe is important in their work, it is a more meaningful learning experience for them.  Good evaluations can initiate lasting professional development.

From my vantage point, I am concerned that with TED, teachers are formally or informally observed at most two times.  I am more attracted to Kim Marshall’s mini-observation model where teachers are observed many times, at least 8-10, but for shorter periods of time.  Marshall explains that this model gives the observer, principal, a broader window into a teacher’s practice.

What do we know about measuring a teacher’s effectiveness by using student achievement data?  Specifically, what about Value-Added Measurement (VAM) as a means to judge the effectiveness of a teacher’s performance?  While there are advocates for this system, there are many educators who object to using VAM as a means to measure effectiveness (see the open letter to Governor Deal referenced above).  In Education Week article written by R. Barker Bausell, Probing the Science of Value-Added Evaluation, he outlines in detail why teacher evaluations based on VAM are flawed.  In summary, he gives five compelling reasons why we should not use VAM:

  1. students are not randomly assigned to teachers;
  2. extraneous, uncontrolled events affect student performance on tests and overall achievement in a class;
  3. statistical analyses reported for an evaluation would be reported in terms of all teachers aggregated;
  4. there is no plan to include reporting student transfers, dropouts, missing test data, scoring errors, improperly marked test sheets, clerical errors resulting in incorrect class lists, and so forth for each teacher;
  5. and value-added analysis constitutes a series of personal, high-stakes experiments conducted under extremely uncontrolled conditions and reported quite cavalierly.

If we are going to use student achievement data for “high-stakes” teacher evaluations, let’s use a portfolio of student achievement data accumulated throughout the year, which includes tests, instead of performance on a single end-of-year test. There is a greater chance that the issues Bausell raises in his article will have less bearing on a student’s overall performance in a teacher’s class.  For this to work, we would need principals and teachers to define rigorous standards (Common Core), follow through on providing an excellent, challenging, and engaging learning spaces, parental support, and robust professional development for teachers to fine-tune their practice.

I have written other posts on evaluation.  Here are some links to those pieces.

Let’s take a lesson from the New York State districts that invested in the voices of their teachers and administrators and built a teacher evaluation system that was comprehensive and honored the complexity of the teacher profession.

From my perspective there are five things every teacher evaluation system should include:

  1. self-evaluation that includes goals aligned to a school’s strategic vision;
  2. student feedback in certain grades because students need a voice in their learning;
  3. peer-to-peer feedback because teacher honor the input of colleagues they trust;
  4. supervisor feedback (principal and/or department chair) because they need to understand how each of their teachers meets students’ needs;
  5.  and student achievement data from a year’s worth of learning in the classroom.

I see the process as being an annual one, in which the teacher’s goals are evolving each year.  For a process to work effectively and for the teacher to receive meaningful feedback, the evaluation has to be linked to:

  • a trusting relationship with one’s supervisor;
  • high-quality professional development;
  • classroom observations by supervisors that are well-trained and prepared to see a teacher frequently (8-10 times) but for shorter periods of time;
  • regular and consistent feedback from the supervisor on what they observe;
  • and a year-end report that reflects on the learning that has taken place.

If we can achieve all of this in a culture that does not become overly fixated on high-stakes achievement, we have the power to positively impact the teaching profession.