A national conversation continues to take place about teacher evaluation systems, especially in states that submitted a Race-to-the-Top proposal, an Obama Administration initiative designed to fuel innovation in schools. This competitive grant program required states to submit a plan for how they would retool their current evaluation system, making greater use of student achievement data. If you follow this extensive body of literature, then you know that value-added models for using student achievement data, the exact percent that student achievement data would count in a teacher’s evaluation, and whether a teacher’s cumulative score would be made public have created all types of conversations, arguments, and potentially lawsuits.
Kate Taylor, in a recent New York Times article entitled, Cuomo Fights Rating System in Which Few Teachers Are Bad, tells the story of the battle between Governor Cuomo and the state’s teachers’ union. Cuomo wants an evaluation system that tightly aligns a teacher’s performance rating to his or her students’ test scores. The teachers’ union doesn’t believe this system can give a fair assessment of a teacher’s performance. She writes:
Around the state, administrators, teachers and parents have been protesting the governor’s proposals, which would both increase the weight of test scores, to 50 percent of a teacher’s rating, and decrease the role of their principals’ observations.
Cuomo, other governors, educational policy makers, and other political leaders believe that a system that closely ties students’ achievement scores to a teacher’s evaluation rating is a more effective way to commend good teachers, provide growth plans for average teachers, and weed out those that are not effective. Of course, there has been extensive research and commentary on whether value-added models for aligning student achievement data to a teacher’s performance are truly valid models given the plethora of variables that impact student achievement.
One thing is universally true about school districts that have had to retool their evaluation systems under Race-to-the-Top, they have not been very creative in designing their evaluation systems. They all look pretty much the same, with a few minor tweaks. They might count student achievement scores 25% instead of 40%. In fact, they are designed off of previous models with slight variations.
Another article that appeared in the New York Times, Grading Teachers by the Test, written by Eduardo Porter, suggests that according to Goodhart’ Law, an economic principle related to incentive design that sounds a lot like Heisenberg’s uncertainty principle in physics:
A performance metric is only useful as a performance metric as long as it isn’t used as a performance metric.
The idea in education being that if we rate teachers according to their students’ test scores do we run the risk of “fudging” the data to achieve what we want to achieve.
If we want to study organizations that are innovating their way to an evaluation system that meets the needs of their employees, then we have to go to the business world. We won’t find it in education. But I would argue that education has a lot to learn from the way some creative businesses approach giving constructive feedback to their employees.
In Harvard Business Review, Marcus Buckingham and Ashley Goodall write about the changes taking place at Deloitte Services LP in the article, Reinventing Performance Management. They describe how Deloitte is “rethinking peer feedback, and the annual review, and trying to design a system to fuel improvement.” It strikes me that if we speak with most educational administrators they would say their hope is that their school’s evaluation system would fuel improvement as well. Of course, the data shows that most teachers don’t believe their school’s evaluation system “fuels their improvement.” In a study done by Weisberg, Sexton, Mulhern, & Keeling (2009) called the Widget Effect, the authors write:
In districts that use binary evaluation ratings (generally “satisfactory” or “unsatisfactory”), more than 99 percent of teachers receive the satisfactory rating. (page 6)
If 99% of teachers are seen as satisfactory, then great teaching might go unrecognized while poor teaching does not get addressed. Another piece of data from the study shows that:
In fact, 73 percent of teachers surveyed said their most recent evaluation did not identify any development areas, and only 45 percent of teachers who did have development areas identified said they received useful support to improve. (page 6)
So the bottom line is that most of our evaluation systems do not “fuel improvement.” Not only are there flaws in the design of how we evaluate but there are also flaws in the way we go about implementation of the model. However, there is good data that suggests faculty believe that their principal’s feedback is important but it depends on whether principals are well-trained, understand the instruments they’re expected to use, understand their role in the process, and have confidence in differentiating for individual teachers’ needs. But some school systems, like New York State, are trying to deemphasize the principal’s role in the rating system. For that reason, and others, I think the Deloitte study is interesting for us to consider as a prototype for a new way of thinking about giving effective feedback to teachers.
Here is a high-level comparison of their old and new system.
|Old system||New system|
|Objectives||cascading||performance & strength oriented|
|360 degree tools||Yes||No|
In moving to their new system they used data from research, an understanding of their organizations needs, and a commitment to fuel the growth of their employees. The science of rating systems shows that “62% of the variance in the ratings could be accounted for by individual rater’s pecularities of perception” (page 43). What they concluded from looking at the research is that ratings do not measure the performance of the ratee as much as they reveal the biases of the individual rater. So they moved away from ratings. They also moved away from annual reviews to weekly and quarterly feedback based on team projects because their focus was on “spending more time helping their people use their strengths and we wanted a quick way to collect reliable and differentiated performance data” (page 44).
What Deloitte realized from a study done by the Gallop Organization on strengths-based leadership, as well as their own research using their high-performance design teams, is that if an evaluation systems focuses on strengths the person being evaluated invests more heavily in the process. Buckingham and Goodall write:
It found at the beginning of the study that almost all the variation between high- and lower-performing teams was explained by a very small group of items. The most powerful one proved to be “At work, I have the opportunity to do what I do best every day.” (page 44)
So if we work to align a person’s job responsibilities to their strengths, then we maximize opportunities for that person to be successful in their work. In the Deloitte study, here are the three items they found had high correlation with high-performing teams:
- Co-workers on the team were committed to doing quality work.
- The company’s mission inspired members of the team.
- Members of the team have a chance to use their strengths everyday.
When they designed their new system they had three objectives to fulfill. They were:
- The new system would allow them to recognize performance, particularly through variable compensation.
- The new systems had to facilitate ways in which they could CLEARLY SEE each person’s performance.
- The new system had to be able to fuel changes in performance.
I found it interesting that to achieve the second objective they redesigned the system and redefined the questions they asked of people being evaluated. First, they made their system highly relational, encouraging and creating expectations and time for each person to be in conversation with his or her immediate supervision or team-lead. To move away from rater reliability issues, they asked the team leader to use a set of four questions that focused more on the future relationship of the leader to the person being evaluated. The four questions were (page 46):
- Given what I know of this person’s performance, and if it were my money, I would award this person the highest possible compensation increase and bonus.
- Given what I know of this person’s performance, I would always want him or her on my team.
- This person is at risk for low performance.
- This person is ready for promotion today.
“In effect, we are aksing our team leaders what they would do with each team member rather than what they think of that individual.”
I find it interesting that they pivoted 180 degrees with their questions. So in education, what if the principal was required to answer the following questions:
- Would you recommend that your child be taught by this teacher for a full year?
- Would you pick this person to serve on your leadership team for building an ideal school?
- Would you pick this person to lead a new initiative in your school that requires an innovative leader?
Finally, in order to shift the responsibility from the team leader to a team member being evaluated, they set up a system where the person being evaluated identifies their strengths through a self-assessment tool and then shares those with other team members, the team lead and the organization. They have found:
that if you want people to talk about thow to do their best work in the near future, they need to talk often (page 48).
So their new system facilitates frequent conversation between team member and team lead about personal and professional strengths and progress towards goals. They designed for these conversations to be simple, frequent (weekly), quick and engaging.
As they have developed experience with their new system, there is a shift in the question that drives their work: from “what is the simplest view of you to what is the richest view of you?
So unlike evaluation systems being designed by state departments of education, or for that matter evaluation systems that exist in almost all public and private schools, we should be designing systems that provide for the richest view of our teachers. The richest view will not come from assigning 50% of the rating score to student achievement results. A teacher is a more complex professional than the results his or her students achieve on an imperfect standardized test that measures only a very small snapshot of what the student knows, understand and can do.
As educators, we have to be bold, creative, and thoughtful as we attempt to co-create the systems that will be used to evaluate our work. Our voice must be at the table in designing the process if it is going to succeed and fuel our improvement. Some answers to our questions are right before our eyes in the processes used by other organizations. Let’s learn from each other.
Pingback: Does #accountability work to improve performance? | Center for Teaching