Using Standardized Test Data To Guide Instruction
and Intervention. ERIC Digest.
by Mertler, Craig A.
When teachers review test score reports, they may find the sheer volume
of information presented overwhelming, and they may also be unsure how
to interpret and use results in the classroom. While the idea of data-driven
decision-making is not new, it does require a special skill to focus on
a few key pieces of information from a test and use them to make instructional
changes. This Digest addresses two ways that classroom teachers can use
the results of standardized tests: (1) to revise instruction for entire
classes or courses and (2) to develop specific intervention strategies
for individual students.
USING TEST SCORES TO REVISE GROUP INSTRUCTION
Test publishing companies typically provide classroom-level reports
to enable teachers to see how a group of students performs across the curriculum.
Even if a group of students has moved on by the time score reports are
available, teachers should examine class-level results as a source of information
for revising curriculum and instruction for the next class. Content areas
or subtests in which high percentages of children are performing below
average indicate areas of deficiency.
Once teachers have noted and prioritized deficiencies, they may consider
one or more of the following questions:
* Where is this content addressed in our district's curriculum?
* At what point in the school are these concepts/skills taught?
* How are the students taught these concepts/skills?
* How are students required to demonstrate that they have mastered the
concepts/skills? In other words, how are they assessed in the classroom?
Answers to these questions should point the way to new methods of instruction,
reinforcement, or assessment (Mertler, 2001, 2003). They may also introduce
evidence that the curriculum and the tests are not in alignment.
REVISING GROUP INSTRUCTION:AN EXAMPLE
While reports from state tests and tests from commercial publishers
vary in format, most feature certain common elements. Riverside, which
publishes the Iowa Tests of Basic Skills, provides an illustrative sample
class performance profile at the following Web address: http://www.riverpub.com/products/group/itbs_a/scoring.html#grpperm
This report, like many others, offers both norm-referenced test results,
which allow performance comparisons with other groups of students taking
the test, and criterion-referenced information, which provides data such
as how many questions students attempted and how many correct answers they
gave for each category of question. Language skills might, for example,
involve subtests of spelling, capitalization, punctuation, and usage; mathematics
might break down to concepts, problem solving, data interpretation, and
computation. Sometimes you'll also be able to see the number of questions
devoted to each area within a subtest (e.g., in the area of mathematics
concepts, how many questions deal with number properties, with algebra,
with geometry, with measurement, and with estimation).
Typical scores reported might also include the following:
Standard (or "Scale") score (SS): A score that has been transformed
mathematically and put on a scale to allow comparisons with different forms
and levels of a test.
Grade equivalent (GE) Score: A norm-referenced score that indicates
the grade and month of the school year for which a score is average. The
average score for a fifth grader being tested in the seventh month of the
school year would be 5.7. If a child has a GE score well above his or her
grade in school--a fifth grader with a GE of 9.1 on a reading subtest,
for example--it doesn't mean that the child can do ninth-grade work, but
rather, that he or she scored the same as an average entering ninth grader
would if the ninth grader took the fifth-grade test.
National percentile rank (NPR): the percentage of students in the norm
group that performed at or below a particular performance level. It's important
to note the group to which students are being compared. Some test publishers
provide separate norms for, say, large urban school districts across the
country, or Catholic schools, while also providing norms based on a representative
sample of test-takers across the country, and/or other groups taking the
test in the state.
Normal curve equivalent (NCE): A normalized standardized score with
a mean of 50 and a standard deviation of 21.06 resulting in a near equal
interval scale from 0 to 99. The NCE was developed by RMC Research Corporation
in 1976 to measure the effectiveness of the Title I Program across the
United States and is often used to measure gains over time.
National stanine (NS): Stanine scores range from 1 to 9, with a score
of 5 representing an average range. The percentage of scores at each stanine
level in a normalized standard score scale is 4, 7, 12, 17, 20, 17, 12,
7, and 4, respectively. Percentile rank scores provide similar, though
more precise, information. For example, a percentile rank near the middle
of the distribution (e.g., 45 to 55) will be roughly equivalent to a stanine
score of 5.
A class test report might also present a graphic illustrating confidence
bands, which represent the margins of error for individual subtests. Studying
them will permit a teacher to get a quick overview of a class' performance
because non-overlapping bands indicate that scores are truly or significantly
different from each other. For example, the students in a class might perform
significantly lower on "Vocabulary" than on "Reading Comprehension."
It is helpful to identify the subtest(s) upon which a particular class
achieves at a national percentile rank of below 50 in order to make these
content areas targets for possible instructional change. Alternatively,
teachers might look at the skill areas in which high percentages of students
scored in the bottom 25 percent or low percentages of students scored in
the top 25 percent. Again, teacher would want to rank order, or otherwise
prioritize, these areas for possible revision of instruction.
USING TEST SCORES TO DESIGN INDIVIDUALIZED INTERVENTION
Standardized test data may also be used very effectively in order to
guide the development of individualized intervention strategies. First,
however, it is important to remember that general achievement tests are
intended to survey basic skills across a broad domain of content (Chase,
1999). On almost any standardized achievement test, a given subtest may
consist of as few as five or six items. The fewer the number of items on
a subtest, the less reliable the scores will be (Airasian, 2000, 2001).
Careless errors or lucky guesses by students may substantially alter the
score on that subtest, especially if scores are reported as percentages
of items answered correctly or as percentile ranks. Therefore, it is important
not only to examine the raw scores and percentile ranks, but also the total
number of items possible on a given test prior to making any intervention
decisions (Mertler, 2003).
Nearly all publishers of standardized achievement tests provide both
criterion- and norm-referenced results on individual student reports. Many
results are reported in terms of average performance (i.e., below average,
average, above average). It is again important to remember that "average"
simply means that half of the norm group scored above and half scored below
that particular score (Gallagher, 1998). Teachers should take great care
to avoid the overinterpretation of test scores (Airasian, 2000, 2001).
The process for examining test results in order to help guide the development
of intervention strategies for individual students is essentially the same
as for the whole class. First, the teacher identifies any content areas
or subtests in which the student performed below average. Second, the teacher
establishes priorities among these areas, selecting a workable number of
content areas, perhaps one or two, to serve as the focus of an intervention.
Third, the teacher identifies new or different resource materials, methods
of instruction, reinforcement, and/or assessment in order to meet the needs
of the individual student. The success of this intervention may be monitored
both through classroom assessments and on future test scores. Given the
length of time it takes for test scores to become available, it may be
that a teacher at the next grade level will have to follow through on the
Designing Intervention Strategies: An Example
Individual score reports, like classroom performance reports, tend to
contain both norm- and criterion-referenced information. The results may
include scaled scores, grade-equivalent scores, national stanines, normal
curve equivalent scores, national percentiles, and national percentile
bands as defined above. The national percentile ranks and associated confidence
bands allow the teacher to see how well a particular student performed
in relation to the national norm group on the various subtests. An individual
skill report might also include a breakdown of how many items the test
included in each skill area, how many the student attempted, how many he
or she got correct, and how this performance compared to the national norm
By way of illustration, readers may wish to view a score report for
a fictitious student, Mary Sanders, who took the Iowa Tests of Basic Skills.
This score report (http://www.riverpub.com/%20products/%20group/itbs_a/scoring.html#indperm)
shows percentile rank scores ranging from a low of 20 ("Capitalization")
to a high of 71 ("Reading Comprehension" and "Listening"), with the student's
performance substantially below the norm group in several areas, including
"Capitalization," Word Analysis," "Vocabulary," "Math Concepts and Estimation,"
and "Math Computation." These are potential areas the classroom teacher
would want to target for possible interventions. Further examination of
the "Math Concepts and Estimation" portion in the criterion-referenced
section pinpoint Mary's difficulties to the items that dealt with "Algebra."
Other areas in which Mary scored in the "Low" category include "Punctuation:
Commas," "Math: Problem Solving: Approaches and Procedures," and "Social
Studies: Economics." Again, this information would likely prove most essential
in designing an intervention plan for Mary. Teachers can perform similar
analyses on individual score reports from their own state tests or other
commercial tests. The important point to remember is to use the data to
identify specific areas of difficulty in order to plan a well-targeted
Teachers can learn to use empirical test data to assist in instructional
decision-making for their classes or for individual students. To avoid
being overwhelmed by data, especially since much of the information provided
on test reports is analogous, teachers might wish to begin their inquiry
by focusing on such scores as national percentile ranks and their associated
confidence bands. Interpreting standardized test data for use in making
instructional decisions does take some practice. Limiting the data to be
interpreted and understanding what those scores really mean makes the process
more efficient and allows teachers to make valuable use of their students'
standardized test data to bring about increased achievement.
Airasian, P. W. (2000). Assessment in the classroom: A concise approach
(2nd ed.). Boston: McGraw-Hill.
Airasian, P. W. (2001). Classroom assessment: Concepts and applications
(4th ed.). Boston: McGraw-Hill.
Chase, C. I. (1999). Contemporary assessment for educators. New York:
Gallagher, J. D. (1998). Classroom assessment for teachers. Upper Saddle
River, NJ: Merrill.
Iowa Tests of Basic Skills Individual Performance Profile. (2002). Retrieved
July 9, 2002, from http://www.riverpub.com/products/group/itbs_a/scoring.html#indperm
Mertler, C. A. (2001). Interpreting proficiency test data: Guiding instruction
and intervention. Unpublished manuscript (inservice training materials),
Bowling Green State University.
Mertler, C. A. (2003). Classroom assessment: A practical guide for educators.
Los Angeles, CA: Pyrczak.