ERIC Identifier: ED315429 Publication Date: 1989-03-00
Author: Gardner, Eric Source: ERIC Clearinghouse on Tests
Measurement and Evaluation Washington DC., American Institutes for Research
Five Common Misuses of Tests. ERIC Digest No. 108.
(Reprinted from "Ability Testing: Uses, Consequences, and Controversies,"
1982, with permission from the National Academy Press, Washington, DC.)
1. ACCEPTANCE OF A TEST TITLE FOR WHAT THE TEST MEASURES
There is a tendency for unsophisticated test users to accept the name
assigned to a test as an accurate and complete description of the variable being
measured. Since titles must be brief, they cannot convey all that the user needs
to know about the kind of behavior to be measured. All tests are open to this
kind of uncritical abuse. Since there are so many facets of cognitive ability,
it is obvious that no test can be an adequate measure of them all. Only full
knowledge of the items can reveal what is being measured. Furthermore, the
testing situation may completely change the expected behavior.
If a non-English speaking or blind pupil is given an "aptitude" test in
printed English, it obviously doesn't measure any aspect of "aptitude" or
"intelligence" except lack of knowledge of English or lack of vision. In a less
obvious area, a test labeled "Science Achievement" may be an acceptable test to
sample the science curriculum for students in a particular fifth grade science
course but fail to function as a science test at all for most pupils if the
reading difficulty is at the high school level. A test producer's claims for an
achievement test or an aptitude test do not mean that it will function as such
in all circumstances with all pupils. Failure to examine the manual and the
items carefully in order to know the specific aspects of cognitive ability to be
tested (memory, vocabulary, type of reasoning, etc.) can result in misuse by
virtue of selecting an inappropriate test for a particular purpose or situation.
2. IGNORING THE ERROR OF MEASUREMENT IN TEST SCORES
Every test score contains an error of measurement. It is a misuse of any test
score or any observation to accept it as a fixed, unchanging index containing no
error. It is impossible to say with certainty that an individual's observed
score gives his "true" performance on the general domain about which inferences
are to be made. The best that can be done is to estimate experimentally the
standard error of measurement and then use that value to set up a band within
which a probability can be stated about the "true" score's being within that
band. That 1) we cannot accept an SAT score of 550 as a precise measure, 2) we
must accept a range of scores, and 3) we must then expect to be wrong a certain
proportion of the times does not mean that the SAT does not furnish useful data.
It does mean that the test score is being misused if knowledge of the size of
the errors of measurement is not used in interpreting the score.
In the case of most standardized test scores, the magnitude of the errors is
made explicit, not hidden or unknown. In fact, the errors in essay grading or
any other type of evaluative data have far larger but usually unknown errors of
Some people reject the notion of basing decisions on probabilistic data.
However, probability estimates are involved in almost all decisions. For
example, the decision to cross a busy street at a particular instant is not made
with a probability of 1.0 of doing so safely.
3. USE OF A SINGLE TEST SCORE FOR DECISION MAKING
Misuse of tests occurs when scores are not considered and interpreted in the
full context of the various elements that characterize pupils, teachers, and the
general educational environment involved. For a test score represents only a
sample from a limited domain and does not include the variety of factors that
might influence that score. For example, in decisions determining admission to
college, SAT scores should not be used in isolation and are in fact usually
considered along with the pupil's high school record and other relevant data,
such as teacher's or supervisor's recommendations concerning motivation,
leadership ability, creativity, involvement in extracurricular activities, etc.
All of these can then be evaluated against the student's socioeconomic
background, along with consideration of any social obstacles or unusual physical
demands required of the student to reach his current educational level.
4. LACK OF UNDERSTANDING OF TEST SCORE REPORTING
There is substantial misunderstanding, not just among laymen, but also among
many educators, of the meaning of test scores. Most people believe that they
understand the meaning of a raw score or of that particular raw score converted
to a percent of items answered correctly, as in the case of many
criterion-referenced tests. However, even in this most elementary illustration;,
more is involved than a single number indicates. Forty-five items answered
correctly out of fifty easy items has a substantially different meaning than
forty-five items answered correctly out of a sample of fifty very difficult
items from the same domain.
The interpretation of a raw score converted to a percentile score causes even
more problems. The statement that "In a norm- referenced test half the pupils
must fail" is a good nor a poor performance. It merely indicates that among the
group used as a frame of reference this score was higher than that reached by 20
percent of its members. If the group were of high ability or had unusual skills,
a percentile rank of 20 might indicate an excellent or even remarkable
The misinterpretation of grade equivalents is even more common. A grade
equivalent is the score that was exceeded by 50 percent of the group at the
specific time when the test was given. It does not represent a standard to be
attained. It does not represent the grade in which the pupil should be placed.
To compensate for the decreasing emphasis on test construction and test
interpretation in teacher training institutions, there have been efforts by the
National Council on Measurement in Education (NCME)--a national organization of
professionals concerned with testing and measurement issues. It publishes The
Journal of Educational Measurement and Measurement in Education.) and other
organizations to provide workshops and reading material on measurement issues.
Both parents and professional educators stand to benefit, since both are
involved in the misuse of testing based on misinterpretation of scores.
5. ATTRIBUTING CAUSE OF BEHAVIOR MEASURED TO TEST
It is common, especially for critics of testing, to confuse the information
provided by a test score with interpretations of what caused the behavior
described by the score. A test score is a numerical description of a sample of
performance at a given point in time. A test score gives no information as to
why the individual performed as reported.
Claiming that it does, whether intended as a positive attribute or a
criticism, is tantamount to test misuse. Furthermore, no statistical
manipulation of test data, even though combined with the best additional data,
will permit more than probabilistic inferences about causation or future
The current reports on the decline of SAT scores is an excellent example of
the difficulty in ascribing causation to known performance. The charge given the
researchers by the investigating panel was to explain the causes of the drop in
SAT scores. They were able to describe the drop and offer changes in test
populations as a plausible partial explanation for the initial drop but could
only speculate on the effect of other variables and the reasons for the
(COMPILED BY ERIC/TM)
Echternacht, Gary (1981) The Uses and Misuses of
Test Scores: Technical Assistance Perspective Paper presented at the Annual
Meeting of the American Educational Research Association, Los Angeles, CA, April
13-17, 1981 (ED 199 275).
Green, Donald Ross (1985) Misinterpreting and Misusing Tests: Some New Ways.
ERIC Document Reproduction Servcice, ED 291 805.
Kearney, C. Philip (Fall, 1983) Uses and Abuses of Assessment and Evaluation
Data by Policymakers. Educational Measurement: Issues and Practice; 2 3, 9-12.
Woodring, Paul (Dec 16, 1987) Irresponsible News Stories on SAT Scores Misuse
the Facts and Lead to Confusion. Chronicle of Higher Education, 34, 16, B1.
Please note that this site is privately owned and is in no way related
to any Federal agency or ERIC unit. Further, this site is using a
privately owned and located server. This is NOT a government sponsored
or government sanctioned site. ERIC is a Service Mark of the U.S. Government.
This site exists to provide the text of the public domain ERIC Documents
previously produced by ERIC. No new content will ever appear here
that would in any way challenge the ERIC Service Mark of the U.S. Government.