ERIC Identifier: ED447200 Publication Date: 20001100 Author: Rudner, Lawrence M. Source: ERIC Clearinghouse on Assessment and Evaluation College Park MD. Bayes' Theorem: An Old Tool Applicable to Today's Classroom Measurement Needs. ERIC/AE Digest.Much of today's assessment research and development concentrates on normreferenced tests which, by definition, are designed to rankorder students by placing them on broad continua representing unidimensional traits. While the summative information from normreferenced assessment serves many purposes, there is a rising call for criterionreferenced information concerning what students know and can do relative to clearly defined desired outcomes of instruction. Although criterionreferenced interpretations of normreferenced tests are commonplace, the literature on criterionreferenced tests from the 1970s and 80s can provide some insights to guide today's research and practice. As Hambleton and Sireci (1997) point out, the differences between the performance tests of today and the criterionreferenced tests of the 1970s are not fundamental. Both are focused on assessment of what students know and can do. This Digest introduces ways of responding to today's rising call for criterionreferenced information using Bayes' Theorema method that was coupled with criterionreferenced testing in the early 1970s (see Hambleton and Novick,1973). After introducing Bayes' Theorum, I discuss how it can be applied to diagnostic testing, adaptive testing, and the scoring of performance items. The key advantages of using this model are that relatively small datasets are required and that the necessary computations are surprisingly simple. BAYES' THEOREMRather than placing a student on an ability
scale, the goal here is to identify the most likely classification for the
examinee. This classification can be dichotomous (e.g.,master/nonmaster) or
polychotomous (e.g., master/atrisk/nonmaster) or involve placement on a
categorical or interval scale. To illustrate Bayes' Theorem, I will provide a
simple example where the goal is to classify an examinee as being either a
master or a nonmaster. We will use responses to previously piloted items to
determine the probabilities of mastery P(M) and nonmastery P(N) and then
classify the examinee based on those probabilities. Lacking any other
information about the examinees, we will assume equal prior probabilities, i.e.,
P(M)=.50 and P(N)=.50. After each item is scored, we will update P(M)and P(N)
based on the response to the item.
As givens, we will start with a collection of items for which we have determined the following four probabilities: 1. Probability of a correct response given that the examinee has mastered the material. 2. Probability of an incorrect response given that the examinee has mastered the material. 3. Probability of a correct response given that the examinee has not mastered the material. 4. Probability of an incorrect response given that the examinee has not mastered the material. We will denote these as P(CM), P(IM), P(CN), and P(IN), respectively; note that we have different conditional probabilities for each item. These conditional probabilities can be determined from very smallscale, lowcost pilot testing; one approach is to use the proportions of examinees in each group responding correctly or incorrectly. Suppose that for item 1, 90% of the masters and 40% of the nonmasters responded correctly. Since a person either responds correctly or incorrectly, P(CM)=.90, P(IM)=.10, P(CN)=.40, and P(IN)=.60. The task then is to update P(M) and P(N) based on the item responses. The process for computing these updated probabilities is referred to as Bayesian updating, belief updating (probabilities being a statement of belief), or evaluating the Bayesian network. The updated values for P(M) and P(N) are referred to as the posterior probabilities. The algorithm for updating comes directly from a theorem published posthumously by Rev. Thomas Bayes in 1763: P(MC) * P(C) = P(CM) * P(M)Let us suppose our examinee
responds correctly to item 1. The probability of a correct response, P(C), is
thus 1.0 and by Bayes' Theorem, the new probability that the examinee is a
master given a correct response is
P(MC) = (.90 * .5) / 1.0 = .45 Similarly, P(NC) = P(CN) * P(N) = .40 * .5 = .20. We can then divide by the sum of these joint probabilities to obtain posterior probabilities, i.e., P'(M) = .45 / (.45+.20) = .692 and P'(N) = .20 / (.45+.20) = .308. We next use these posterior probabilities as the new prior probabilities, score the next item, and again update our estimates for P(M) and P(N) by computing new posterior probabilities. We iterate the process until all the items have been scored. Equivalently, we could have computed the product of the relevant probabilities (correct or incorrect) for masters and nonmasters and then divided by the sum to obtain the last posterior probability. The Bayesian network defined here is a simple diverging graph. The master/nonmaster state is causally connected to the set of item responses. When applied to decision support systems and other expert systems, Bayesian networks are typically much more complex, involving hundred of interconnected and crossconnected variables. Evaluating such networks is computationally complex. As we have shown here, however, the computations for basic applications are quite simple CLASSROOM APPLICATIONSThe basic framework described above
is applicable to a wide range of settings. For example, the framework can be
used to score a diagnostic pretest. Here the pretest would cover a variety of
skills. A pilot test would determine the probabilities of responding correctly
for people who have mastered each skill and the probabilities for those who have
not done so. After the test is given to an individual, the probabilities of
mastery for each skill could be computed. The resultant list would identify
which skills have been mastered and which are likely in need of attention. One
could go further and model specific misconceptions (e.g., the examinee sums
denominators when adding fractions). Here the relevant probability would be the
likelihood of selecting a particular incorrect option (or generating a
particular type of wrong answer), given that an examinee has a specific
misconception. Such a test would not only provide mastery information, but
identify specific areas to correct.
The framework is applicable to multidimensional items and tests. One could write items, for example, that require the application of mathematical skill to solve a science problem. A pilot test would need to be administered to compute the probability of responding correctly to each item given mastery of the mathematics skills and the probability of responding correctly to each item given mastery of the science skills. The one test with complex items could then be scored using the Bayes' Theorem and information about each skill area. Bayesian networks have been used as the basis for computer adaptive tests. Welch and Frick (1993) provide a excellent and simple overview of the topic. Basically, the new posterior probabilities are computed after each item is administered. One stops administering items when the probability of mastery is sufficiently high or low. Items are selected from the pool of remaining items to maximize information or minimize a loss function. The framework can be embedded in an intelligent tutoring system (ITS) to determine mastery after each instructional unit, tailor individualized instruction to characteristics of the student, and adapt that instruction as the student learns material. This would again require a collection of pretested items that assess the concepts covered by each instructional unit. REFERENCES AND RESOURCESOne can easily experiment with
simple Bayesian networks using any of a large variety of free, readily available
software packages. A search on the Internet in September 2000 for 'Bayesian
Network Software Packages' yielded more than 20 free packages that could
potentially be applied. Two that I have tried are Hugin Lite and Genie.
Bayes, T. (1763). Essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53, 370418. Charniak, E. (1991). Bayesian networks without tears. AI Magazine, Winter 1991. Frick, T. W. (1992) Computerized adaptive mastery tests as expert systems. Journal of Educational Computing Research, 8(2), 187213. Hambleton, R.K., & Novick, M. R. (1973). Toward an integration of theory and method for criterion referenced tests. Journal of Educational Measurement,10(3), 159170. Hambleton, R.K., & Sireci, S.G. (1997). Future directions for normreferenced and criterionreferenced achievement testing. International Journal of Educational Research, 27(5), 379393. Spray, J.A & Reckase, M.D. (1996). Comparison of SPRT and sequential Bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21(4), 405414. Welch, R.E. and T. Frick (1993) Computerized adaptive testing in instructional settings. Educational Training Research and Development, 41(3), 4762.

Please note that this site is privately owned and is in no way related to any Federal agency or ERIC unit. Further, this site is using a privately owned and located server. This is NOT a government sponsored or government sanctioned site. ERIC is a Service Mark of the U.S. Government. This site exists to provide the text of the public domain ERIC Documents previously produced by ERIC. No new content will ever appear here that would in any way challenge the ERIC Service Mark of the U.S. Government.