ERIC Identifier: ED314428
Publication Date: 1989-12-00
Author: Echternacht, Gary
Source: ERIC Clearinghouse on Tests Measurement and Evaluation Washington DC., American Institutes for Research Washington DC.

Interpreting Test Scores for Compensatory Education Students. ERIC Digest.

To follow the rules and regulations of compensatory education programs correctly, you must use objective measures when you select students for programs, assess their progress, and monitor the program's quality. Because you have this pressure to use standardized test scores, you should make sure that you use the tests correctly.

In this digest, I point to four practices that administrators often mistakenly follow when they use test scores:

o using test scores alone to select students for programs,

o giving out-of-level tests,

o misinterpreting grade-level, and

o failing to differentiate the degree of error in individual and group scores.

Although these practices may not be widespread, they are serious.


Program regulations for Chapter 1 require that you select students by using objective measures. In addition, state departments of education sometimes impose other requirements--for example, a program can serve only students who score below the 40th percentile rank or all students who score below the 20th percentile rank.

These requirements often lead administrators to select students on the basis of test scores alone because

o the requirements are stated in terms of test scores, and

o when program monitors review programs, they appraise them in terms of state and federal regulations.

Nevertheless, you should not make a decision about an individual student by using a test score by itself. It is acceptable to use test scores to make decisions in a sequence of assessments, but it is unacceptable to use test scores by themselves in a sequence of one assessment. You are unfair to students if you simply say that all students who score below the 40th percentile rank are in the program and all who score above the 40th percentile rank are ineligible.

You must remember that test scores are neither completely reliable nor valid indicators of academic performance. For example, if students take an equivalent form of a test at different times, their scores will change somewhat. This unreliability is important for those whose scores are near the cut-off score for selection because if you administer the same test a second time, some students who previously scored below a cut-off may score above the cut-off a second time.

Similarly, reading tests give you only general measures of reading ability. Some students may be good readers in certain content areas, yet they may score poorly on a given test because the reading passages in that test do not include the content areas they know.

Good programs select students by using several assessment tools, rather than just one. Although the regulations do not explicitly state other requirements, they do allow you to use additional assessment tools in selecting students. Ask your state director how you can best use other assessment tools, such as report card grades, results of other tests, and systematic teacher assessments obtained through questionnaires.

Some common methods for using multiple assessments are:

o selecting students who score below prescribed cut-offs on both your district's standardized test and another state-mandated test;

o using your district's standardized test to identify a pool of possible participants, then using either a teacher-completed questionnaire or report card grades to select students from the pool;

o using a systematic method for obtaining teachers' judgments about students' needs in order to identify a pool of possible participants, then using a standardized test to select students from the pool; or

o using the standardized test to identify a pool of students, then creating a study team to select students from the pool and carefully documenting the study team's process.


Out-of-level testing occurs when you give a standardized test to students who are at a different grade level than the one for which the test is designed. In some cases, school officials use out-of-level tests in compensatory programs because those students are behind their peers and in-level testing is frustrating for them. Administrators who follow this practice believe that somehow it is more valid to give those students tests designed for lower grade levels.

While out-of-level tests may be less frustrating to some students, the scores obtained from them are also less valid because

o the content for out-of-level tests does not represent the content taught in the classroom,

o the scale that test publishers use to link different test levels is loaded with error,

o there are no norms for out-of-level tests,

o scores obtained on tests of different difficulty are not comparable, and

o when obtained, out-of-level scores appear to be too low.

Although in-level test scores are more reliable in the middle than at the high- and low-score ranges, they are quite reliable in placing students at the high or low end of the scale. For example, with a reasonable degree of assurance, we can say that a student who scores at the 10th percentile rank is most likely a low-achieving student. What we are less sure about is whether the student is at the 10th percentile rank or the 15th percentile rank. Either way, we are reasonable in concluding that the student is low achieving.

You should use tests at the grade levels for which they are specified by the test publisher. Generally, the content of grade-level tests will represent what is taught in regular classrooms at the specified level.

If your compensatory program is good, it will be closely coordinated with instruction in the regular classroom. Since the purpose of compensatory education is to help students succeed in the regular classroom, using in-level tests will help you in the coordination.


Generally, when school personnel say that certain students perform at grade-level, they mean that those students can learn material at about the same rate and quality as others in the same class. The implication is that students who don't perform at grade-level have significantly more difficulty in class than their peers. Accordingly, when students are labeled as working below grade-level, the implication is that they may not have the aptitude, maturity, or interest to do the work that others in the same class are doing. This interpretation of students' abilities is made by relatively few people.

In contrast, in the testing arena at grade-level has a different meaning. When students score at grade-level, their scores are at the 50th percentile rank. It means that about half of their peers score higher and about half score lower. In testing, at grade-level does not relate to how well students perform in the classroom. Therefore, when you review students' scores, you must consider that, by definition, many students score below grade-level.

Historically, the term grade-level has been important in the politics of compensatory education. Proponents of compensatory education programs have always said that those programs were underfunded because many students who performed below grade-level did not receive program services. In this case, performing below grade-level was defined as scoring below the 50th percentile rank. While it is true that compensatory education may be underfunded and, I believe, is an important part of schooling, it is inappropriate to use the term grade-level in the true testing-related sense.

Since most people use the term grade-level in the general sense, you can either avoid using grade-equivalent test scores or develop a range of scores that indicate satisfactory achievement in the classroom. You may also think of average performance on a test as being between the 23rd and the 77th percentile rank.


Administrators tend to interpret differences in test scores in one of two ways. First, they may think that a difference of one or two percentile rank points is an important difference. Secondly, they may think that a difference of ten points shows that the test is unreliable. Few administrators can differentiate the degree of error in individual and group scores.

An individual test score is just that -- the score that an individual student receives on a test. A group score is the average of several individual scores. For example, the average score of third graders at Horace Mann Elementary School is a group score.

In general, individual scores have more error in them than group scores do. The error in an individual score is largely a function of the test's standard error that is described in the publisher's technical manual. For most of the tests given in elementary and secondary schools, the standard error is about 2.5 raw score points. This means that about 95% of the time, we would expect the scores for individual students to fall within a range of 10 raw score points. That is not particularly reassuring, but it is exactly why we need to use multiple measures for selecting students and why for most of the tests we use we should be a little skeptical of individual test scores and cautious in interpreting differences.

The error in group scores largely depends on the size of the group. Once you have a group of about 30 scores, the magnitude of the errors decreases. By the time you average all the scores for your school district, you can regard the results as accurate as long as there is not some systematic bias operating for most everyone in the district.

You can be confident of your interpretation when you consider score averages of large groups. For instance, if when you consider a group of 55 scores, the score average changes one or two percentile rank points, then that is an important change. If you consider averages based on fewer cases, you must be more cautious. You can be more or less confident of average scores depending on the level. There is a definite hierarchy in the strength of your interpretations. Your interpretations are most sure when you consider district averages, followed in order by building averages, classroom averages, and finally individual students' scores.

Library Reference Search

Please note that this site is privately owned and is in no way related to any Federal agency or ERIC unit.  Further, this site is using a privately owned and located server. This is NOT a government sponsored or government sanctioned site. ERIC is a Service Mark of the U.S. Government. This site exists to provide the text of the public domain ERIC Documents previously produced by ERIC.  No new content will ever appear here that would in any way challenge the ERIC Service Mark of the U.S. Government.