ERIC Identifier: ED470203
Publication Date: 2002-07-00
Author: Kester, Ellen Stubbe - Pena, Elizabeth D.
Source: ERIC Clearinghouse on Assessment and Evaluation College Park MD.

Limitations of Current Language Testing Practices for Bilinguals. ERIC Digest.

Few diagnostic tools are designed explicitly for children who are exposed to two languages (Valdes & Figueroa, 1994). Current practices for assessment of language in bilinguals frequently involve the use of tests that are translated from English to the target language and/or tests designed for and normed on monolinguals. This Digest explains why these common approaches are not well suited for a bilingual population and provides guidance to test developers and administrators regarding more suitable approaches.


When tests are translated from one language to another, they do not retain their psychometric properties. Of particular interest in the assessment of language is the developmental order in which target features of the language are learned. Translating a test from one language to another -- typically from English may mean that items are organized by order of English difficulty, rather than reflecting the developmental order of the target language. The translated Spanish version of the Preschool Language Scale-3 provides an illustration. Restrepo and Silverman (2001) found several item difficulty discrepancies between the original English and the translated Spanish version when tested with predominately Spanish-speaking preschoolers. For example, items related to prepositions, which were relatively easy for English speakers, were more difficult for Spanish speakers. On the other hand, the "function" items requiring students to point out objects based on a description of their use (something like "Show me what people use for cooking" or "What do you sweep with?") were easier for the Spanish speakers than the English speakers.

Figueroa (1989) noted that words may generally represent the same concept but have variations and different levels of difficulty across languages. An illustration of this is found in a study of vocabulary test translations (Tamayo, 1987). When test items were translated from English to Spanish, they differed in frequency of occurrence in each language. Because the Spanish translations were of lower frequency within Spanish, test scores obtained from Spanish speakers were lower compared to scores obtained from the original English version. However, when the vocabulary items were matched for their frequency of occurrence in the original and target language and matched for meaning, test scores obtained from Spanish and English speakers were equivalent.

Similarly, across different languages, the same general category may have different prototypical members, and different words may be associated with each language for the same situation. These contextual variations make translated vocabulary tests particularly vulnerable to imbalance. When Pena, Bedore, and Zlatic-Giunta (in press) asked bilingual four- to six-year-olds to give examples of animals, the children's three most frequent English responses were "elephant," "lion," and "dog," while in Spanish they used "caballo" (horse), "elefante" (elephant), and "tigre" (tiger) in these orders.

In addition to vocabulary differences, grammatical structure also affects the validity of test translation practices. For example, nouns are marked by gender in Spanish, but not English. An English test translated to Spanish will miss aspects of Spanish, such as gender marking, that are not present in the English language. Furthermore, in Spanish, subject information is frequently carried in the verb, resulting in more complex verbs and less salient pronouns as compared to English. In English language assessment, pronoun omission is a hallmark of language impairment, yet this would not be true for Spanish. Thus, translated language tests may target inappropriate features for the target language, resulting in inaccurate assessment of language ability.


Bilingual school children generally fall into the category of circumstantial bilinguals. That is, their circumstances (often a Spanish-speaking home and an English-speaking or bilingual school) require them to use two languages. These different environments typically require different language content. The home environment likely promotes discussions of common family activities, such as cooking or trips to the store, while more academic topics, such as colors, numbers, and shapes, are highlighted in the school environment. Bilingual children thus develop different vocabulary content for each language. From a testing perspective, this can result in underestimation of concept knowledge.

For example, Sattler and Altes (1984) examined typically developing three- to six-year-old bilingual Latino children's scores on the Peabody Picture Vocabulary Test-Revised and the McCarthy Perceptual Performance Scale. They found that the PPVT-R, whether administered in English or Spanish, yielded scores far below those of the norms, while all of the children were estimated to have normal intelligence based on their McCarthy scores.

A number of studies in the area of vocabulary acquisition illustrate that in early development, bilinguals learn unique words across their two languages, rather than learning two words (one in each language) for each concept. Pearson, Fernandez, and Oller (1992) found that young bilinguals (8 to 30 months) often produced words for different concepts in each language, with few concepts labeled in both languages. Similarly Pena, Bedore, and Zlatic (in press) found that in a category generation task, bilingual children (ages 4 to 6 years) produced more unique words across Spanish and English than overlapping words.

When monolinguals and bilinguals are compared on measures of vocabulary, differences become more apparent. Pearson, Fernndez, and Oller (1993) used the Spanish and English versions of the MacArthur Communicative Development Inventory (1989) to estimate bilingual toddler's vocabularies. They found that when compared to monolingual norms in either language, their scores were low. However, when they compared the total number of unique words they produced across the two languages, their scores were more comparable to the monolingual norms.

Another example of findings of differential performance between monolinguals and bilinguals is with the Test de Vocabulario en Imagenes Peabody: Adaptcion Hispanoamericana (TVIP-H). This version of the Peabody Picture Vocabulary Test (PPVT) was normed on monolingual Spanish speakers outside of the U.S. mainland and then tested with bilingual Hispanics on the U.S. mainland. Bilinguals' scores were lower than those of the monolinguals (Dunn, 1988). The differences between monolinguals and bilinguals increased with age and coincided with schooling in English. Similarly, Umbel, Pearson, Fernandez, and Oller (1992) used the PPVT-Revised and the complementary Spanish version, the TVIP-H, to compare the receptive vocabularies of bilingual children ages 6 through 8 who were exposed to both Spanish and English in the home. On average, children responded correctly to 67% of the items in their age range in both languages, but another 8% to 12% were known only in one of their two languages. Administration of this test in only one language--even the "dominant" language--would have led to an underestimation of vocabulary knowledge.

Conceptual scoring (Pearson, Fernandez, & Oller, 1993) has been proposed as a more meaningful measure of the bilingual's conceptual knowledge. The system, which entails counting the concepts demonstrated (either through constructed or selected responses) in both languages and correcting for concepts shared in the two languages, results in a more valid representation of a bilingual child's knowledge of concepts. The English/Spanish Bilingual Verbal Ability Tests (BVAT) (Cummins, Munoz-Sandoval, Alvarado, & Ruef, 1998) is based on this method.


The psychometric properties of a test, including item difficulty, item discrimination, reliability, and validity, do not automatically translate from one language to another, nor do they remain the same when a test is administered to a different audience than intended. Language tests can be improved if test developers:

* Ensure that concepts and linguistic features are

appropriately represented for each language.

* Use conceptual scoring systems to eliminate

understimation of ability.

* Select an appropriate mix of item types to gain the

maximal amount of information about language

ability in each language (e.g., an English grammar

test may contain more emphasis on pronouns, while

a Spanish grammar test might include more items

related to gender and number agreement).

* Consider the frequency of occurrence of the words.

An important long-term goal is to better understand the development of language skills in bilinguals in order to develop language tests designed for, and normed on, bilinguals.


