ERIC Identifier: ED278657
Publication Date: 1986-00-00
Author: Barrett, Joan
Source: ERIC Clearinghouse on Teacher Education Washington DC.

The Evaluation of Teachers. ERIC Digest 12.

The public views teacher evaluation as a major problem in the school system today (Soar and others, 1983). State legislatures, aware of the concern, want to mandate more effective evaluation. Common methods for evaluating teachers, such as measurement tests of teacher characteristics, student achievement test scores, and ratings of teachers' classroom performance, have been ineffective. Some research has been done to improve the evaluation process, but teacher assessment, in general, remains unorganized. This digest provides information about evaluation types, criteria, methods, procedure, and successful evaluation strategies.


Darling-Hammond and others (1983) define teacher evaluation as "collecting and using information to judge." Two evaluation types exist: formative and summative. Formative evaluation is a tool used to improve instruction. Summative evaluation is a tool used to make personnel decisions. Both evaluation uses have received much attention in recent literature as the teaching profession considers evaluation an integral part of staff development and the administration looks to evaluation data as evidence in accountability debates.


The developmental problems of teacher evaluation programs begin with the fundamental consideration: evaluation of what? Criteria used to determine teacher quality would seem to center on the teaching/learning/assessment cycle. Yet the teaching methods and techniques of a mathematics teacher differ from those of a music or English teacher. Are there generic characteristics peculiar to all "good" teachers?

The fundamental obstacle to professional agreement is that everyone - parent, administrator, legislator, and teacher - purports to know exactly what a good teacher is. Each eagerly describes this teacher in great, but mostly subjective, detail (Soar and others, 1983). Evaluation criteria must be measurable. The current literature generally agrees that "good" means "effective." A good teacher teaches; students, in response, learn. But there are serious disadvantages in evaluating teachers by their students' achievement; these disadvantages are discussed in the Evaluation Methods section.

Criteria for evaluation must include intangible and tangible teaching aspects (Darling-Hammond and others, 1983; Wise and others, 1984; Woolever, 1985). Intangible aspects include student rapport and social responsibility while tangible aspects comprise well-written lesson plans and test scores. The wide range of suggested criteria for evaluating teachers has resulted in numerous methods designed to quantify those criteria.


The most important characteristic for any successful evaluation method is validity - whether a test or procedure measures what it purports to measure. It becomes inappropriate, meaningless, and useless to make specific inferences from invalid measurements. Evidence of validity must be accumulated to support inferences made from evaluation results.

Successful evaluation methods also must be reliable, effective, and efficient (Wise and others, 1984). Reliability means consistency - an evaluation always must give similar scores, ranking, or ratings for similiar tests, regardless of the evaluator or the evaluated. Effectiveness implies that the evaluation provides results in their most useful format. Summative evaluation yields a teacher performance score or rank that does not have to be interpreted to be used for accountability. Formative evaluation initiates the improvement of weak areas. Efficiency refers to spending time and money for evaluation training, materials, and procedure to ensure the desired results.

Present evaluation programs consist of varying combinations of the following components. (Strengths and weaknesses accompany the descriptions.)

Teacher interview. This one-to-one conference is used to hire new teachers and communicate evaluation results to experienced teachers. An updated, formalized version, the Teacher Perceiver Interview, reduces possibe interviewer bias. An interview disadvantage is the low correlation between highly rated interviews and subsequent evaluations of teacher effectiveness (Darling-Hammond and others, 1983).

Competency Testing. The National Teachers Examination (NTE) is an example of competency testing. Used for initial certification and hiring decisions, the disadvantage lies in its degree of validity. Most studies of NTE results and evaluations of teacher performance show low correlation. No test has been developed to measure a teacher's professional commitment, maturation of decision-making ability, and social responsibility - all important criteria for effective teaching and learning (Soar and others, 1983). Test proponents, however, maintain that examinations guarantee a basic knowledge level, eliminate interviewer bias, and are legally defensible (Darling-Hammond and others, 1983).

Classroom Observation. This is the most popular evaluation method, usually performed annually by school administrators for experienced teachers and more frequently for beginning teachers. Observation reveals information about such things as teacher interaction and rapport with pupils that is unavailable from other sources. Research criticizes the technique, however, as potentially biased, invalid, and unreliable (Darling-Hammond and others, 1983).

Student Ratings. Using student ratings in teacher evaluation has been restricted to higher education, although student input has been collected informally in middle and secondary schools. This method is inexpensive, and has a high degree of reliability, but questions of validity and bias remain (Darling-Hammond and others, 1983)

Peer Review. Teaching colleagues observe each other's classroom and examine lesson plans, tests, and graded assignments. Peer review examines a wider scope of teaching activities than other methods. Disadvantages include time consumption and possible peer conflict. Formative application features may justify the time demands and minimize sources of tension (Barber and Klein, 1983; Elliot and Chidley, 1985).

Student Achievement. Nationally standardized student achievement examinations often are used to evaluate teachers and school systems by ranking the student, class, and school according to national norms. Research shows that under certain conditions test scores are positively correlated with teacher behavior (Woolever, 1985). But scores also depend on inherent student qualities, such as I.Q., which are independent of teacher influence (Darling-Hammond and others, 1983).

Faculty Self-Evaluation. This method usually supplements more formal evaluation methods and is used with other data to identify weak areas of instruction and classroom management skills. It serves as an important source of information for staff development, but is unsuitable for accountability decisions (Darling-Hammond and others, 1983).

Indirect Measures. Other "good teacher" descriptors have been examined to determine if they correlate with student achievement. These descriptors include enthusiasm, humor, judgment, objectivity, and punctuality (Drake, 1984). Research has found a relationship between teacher flexibility and effectiveness, and some teacher characteristics appear to be more effective in some classroom situations than in others. But these findings have not been used in teacher evaluation (Darling-Hammond andothers, 1983).

Literature exists to support all evaluation methods. Coker (1985) observes that the lack of consensus about evaluation issues represents the lack of knowledge about effective teaching and measurement technology. He further suggests that this knowledge can be acquired through studying the data now generated by valid and reliable methods.


If school districts refine procedures to improve validity and reliability, the effective evaluation should occur (Wise and others, 1984). Successful evaluation procedures begin with a definition of teaching expectations and end with an examination of evaluation results and implications. Formative evaluations include a staff development component to complete the program of assessment and improvement.

The major impediment to procedural development is that schools follow lines of least resistance in developing any new procedure (Darling-Hammond, 1983). Schools often consider the perfect evaluation system to be one that gathers all necessary data quickly, offends no one, and differs little from an unacceptable system used the previous year. Despite this resistance to change, some progress toward developing and implementing innovative and workable evaluation programs continues.


Wise and others (1984) studied 32 school districts and found four - Salt Lake City, Utah; Lake Washington, Washington; Greenwich, Connecticut; and Toledo, Ohio - to have markedly more successful evaluation programs than the others. These researchers concluded that the following strategies can help in implementing an effective evaluation program.

1. Evaluation procedures must address local needs, standards, and norms.

2. Procedures must be consistent with the stated purposes for evaluation.

3. School districts must make a commitment of time and resources.

4. Resources must be used efficiently to achieve reliability, validity, and cost-effectiveness.

5. Teachers should be involved in developing evaluation procedures.

Drake (1984) stresses that an effective evaluation program needs trained evaluators, administrative staff allocated for evaluation time, a staff development program for teachers, and teacher involvement in the evaluation process. Elliott and Chidley's 1985 study of an experimental peer review program found the project's success depended on teacher participation in program design, administrator interest, teacher release-time for planning, clearly stated objectives, and participants sharing information. But despite the considerable published research on teacher evaluation, the scarcity of successful programs indicates much work remains to be done.


