Theories in language testing

Hossein Farhady

Fundamental Concepts in Language Testing (5)

Theories* Hossein Farhady University for Teacher Education Iran University of Science and Technology

Introduction In the previous sections, some fundamental concepts in language testing including function, form, and characteristics of language tests were discussed. In this section, another fundamental concept, namely, the theories of language testing will be briefly explained, and various test types conforming to differing theories will be presented. As mentioned before, there has always been a close relationship between teaching and testing theories. Modifications in the former have changed the nature of the latter because testing methods tend to follow teaching methods. Of course, a clear-cut distinction between methods of teaching and testing does not exist since each method has placed certain priorities on the relative importance of a language component. Nor does a chronological ordering of methods, either for testing or teaching, seem to be applicable because there have been long periods of overlap and/or competition between different methods at different times. Thus, the typology of theories and methods of testing presented here should be considered only tentative. The main purpose of classifying the theories, however, is to clarify the degree of emphasis given to certain skills by various theories. In accordance with different teaching theories, various testing theories such as translation, discrete-point, integrative, and functional have been developed. Each single theory requires a separate paper to be dealt with. However, because of time and space limitations, it seems that a brief explanation of each theory will prove fruitful.

Translation Method Prior to the theoretical developments in linguistics and related fields, a wellestablished theory for language testing did not exist. Testing language ability was accomplished through subjective measures such as translation tasks and essay-type question tests. Such testing procedures, which stemmed from the old grammartranslation method of teaching, paid little or no attention to the statistical characteristics of the tests. In this method, students were given a passage or a set of sentences to translate from the source language into the target language or from the target language into the source language. In other cases, essay-type questions were given to the students and their responses were scored on the basis of subjective evaluations of one or two raters. The accuracy and fairness of such evaluations were at best questionable. These inadequacies forced language testing specialists to devise objective measures and to develop new approaches.

28

Theories in language testing

Hossein Farhady

Discrete-Point Approach Developments in linguistics led to the emergence of structural linguistics and advancements in psychology resulted in behavioristic psychology. The Audio-Lingual approach to language teaching evolved out of the interactions of the principles of structural linguistics and behavioristic psychology. Under the influence of the AudioLingual approach to language teaching, testing procedures were fundamentally modified. The theory of language testing developed by audiolingualists assumed that language was a system of habits concerning matters of form, meaning, and distribution at several levels of structure, namely, those of sentence, clause, phrase, word, morpheme, and phoneme (Lado, 1961). Accordingly, measuring the linguistic properties of these habits at various levels was assumed to manifest the language ability of the learners. This kind of testing, which became popular all over the world, was referred to as the discrete-point approach. The basic tenet of the discrete-point approach rested upon the fact that every single language element, i.e., grammar, vocabulary, pronunciation, etc., should be tested separately. In fact, a discrete-point item measures one and only one element of language at a time. The proponents of this approach viewed language as a system composed of an infinite number of items. They believed that testing a representative sample of these hypothetical items would provide an accurate estimate of the examinees’ language ability. Along with the influences from linguistics, which led to the development of discrete-point items, the influence from psychology on language testing resulted in the application of psychometric principles to language tests. The contribution of linguistics and psychology to language testing helped test developers to construct precise and objective language tests with sound statistical properties. Discrete-point test, usually in the multiple-choice format, are still one of the competing types of language tests. The following sample items demonstrate the applicability of discrete-point tests to measuring various language abilities. 1. Sample spelling item: The student hears BOOK. There is a BOOK on the table. The student writes BOOK. 2. Sample vocabulary item: Integrity means: a. intelligence b. uprightness

c. intrigue d. weakness

3. Sample structure item: Zahra …… in Tehran since 1350. a. is living c. has lived b. lives d. lived Discrete-point tests have proven to be highly reliable and reasonably valid measures of language elements which they are intended to assess. However, because of new advancements and modifications in language teaching methods, language testing procedures were to be modified as well.

29

Theories in language testing

Hossein Farhady

Integrative Approach New trends in linguistics and psychology questioned the foundations of the behavioristic psychology and structural linguistics. Chomsky (1965) presented an innovative approach to the description of language referred to as the generativetransformational theory. Psychologists moved towards a new school, namely, cognitive psychology. The principles of the generative-transformational linguistics along with those of cognitive psychology gave birth to a new approach in language teaching called the cognitive-code learning theory. Following the cognitive-code learning theory, a new approach in language testing, called the integrative theory, was formed. Advocates of the cognitive-code learning and integrative testing believe that language is a holistic phenomenon; thus, it should not be broken into discrete items. They contend that knowledge of the discrete items does not necessarily guarantee the ability to use language in real-life situations. In other words, they claim that the sum of parts does not necessarily equal to the whole. That is, the sum of the knowledge of structure, vocabulary, and other language elements does not necessarily mean that the learner will be able to use language as an integrative tool for communication. Well-known integrative tests include oral interviews, reading comprehension tests, compositions, listening comprehension tests, dictation type tests, and cloze procedures. Among these tests, oral interviews are quite time consuming and costly; reading and listening comprehension tests are well established through multiplechoice items; compositions are less reliable because they are scored subjectively; cloze and dictation type tests, however, are fairly new and quite unknown in most educational circles. Therefore, a detailed explanation of these two tests seems warranted. Cloze Tests Cloze tests have probably been the most popular kind of tests in the last two decades. Although the idea originated in the early fifties, the cloze tests were not utilized as testing instruments until the late sixties and early seventies. Ever since the employment of cloze procedures as measurement devices of language ability, an enormous amount of research has been conducted on almost all aspects of these procedures. The word “cloze” seems to be a spelling corruption of the word “close” as in “Close the door”. The term, coined for the first time by Taylor in 1953, is used to remind the reader of the process of “closure” in Gestalt psychology. In the cloze procedure, the closures are created by deleting certain words from a passage. The examinee, then, is required to fill in the blanks with appropriate words on the basis of contextual clues provided in the passage. It should be mentioned that the cloze procedure was originally developed to determine the readability level of the texts written for native speakers. Later, it served as a device for assessing the reading comprehension ability of native speakers of English. Finally, it was utilized as an integrative measure to evaluate non-native speakers’ command of the language they attempt to learn. Consider the following example:

30

Theories in language testing

Hossein Farhady

Hossein is a freshman and he (1)…….. having all the problems that most (2) ……have. As a matter of fact, his (3) ……started before he left home. (4) ……had to do a lot of (5) …… that he did not like to do. In this passage, there are five missing words. The testee is supposed to read the passage and guess the missing words. According to the theory of expectancy grammar, the more proficient the reader is, the better he would decide on the missing words. In other words, if the reader has a high command of the language, he will easily reconstruct the passage and fill in the blanks with the appropriate words intended. In the above example, the missing words are “is,” “freshmen,” “problems,” “He,” and “things,” respectively. This example would facilitate explaining the technical characteristics of the cloze tests to be presented below. The first step is to define the cloze test. Although various definitions have been suggested, most scholars agree that the cloze test is any passage of appropriate length and reasonable difficulty with every “nth” word deleted. The definition, of course, seems ambiguous. What constitutes appropriate length? What is meant by reasonable difficulty? What is the purpose and the number of “n”? In the following sections, these questions will be answered in order to clarify the issue. The second step is to determine what the appropriate level of difficulty is. Great care must be exercised in selecting a passage for developing a cloze test. If the passage is beyond the linguistic ability of the test takers, they will not understand it, and thus will not be able to determine the missing words. An easy passage, on the other hand, will result in perfectly correct responses for the missing words, and thus will not provide any useful information about the differences among the examinees’ proficiency levels in English. Therefore, the passage should have an appropriate level of difficulty. To determine the appropriateness of the passage difficulty, certain readability scales are developed. One frequently-used readability scale is constructed by Fry (1977). According to this scale, passages are rated to be suitable from grade 1 up to grade 12 for native speakers. Research findings indicate that non-native speakers’ command of language at the pre-university stage ranges from 4 to 6, at the university undergraduate stage ranges from 7 to 9, and at the graduate level ranges from 10 to 12 grades. Thus, to find a passage of appropriate difficulty, one has to apply this scale and find out whether the passage is suitable for the group of examinees or not. It should be mentioned that the proposed readability scale is only one of the many available scales. Although it is frequently used for research purposes, it should not be considered as “the scale” but just one of the available ones. The third step is to determine what the “n” is. This letter simply refers to the number of words preceding a deletion. Of course, the greater the number of words between the two deletions, the easier the guessing of the missing words because more contextual clues are available for the examinee. Therefore, to determine the appropriate number of words between the two deletions, researchers developed cloze tests with every 3rd, 5th, 7th, 9th, and 11th words deleted. The results of experiments revealed that the passage with every 7th word deleted is most reliable and valid. Consequently, the “n” was set to be “7” as one of the principles of the later called the “standard cloze test”.

31

Theories in language testing

Hossein Farhady

Another line of research investigated the influence of leaving the first and the last sentences of the passage intact on the characteristics of experimental cloze passages. Therefore, the researchers developed cloze passages in which they did not delete any word from the first and the last sentences in the passage. In comparison to the cloze tests, which started the deletion from the first sentence and continued the deletion to the last sentence of the passage, the experimental cloze tests showed higher reliability and validity than other forms of the cloze tests. Thus, another criterion for the standard cloze test was set to leave the first and the last sentences of the passage intact. The fourth step is to determine the number of deletions in a cloze test. Again, researchers developed cloze passages with 100, 90, 80, 70, 60,50, 40, 30, and 20 deletions. The results indicated that a cloze passage having 25 to 30 blanks was the most efficient one in terms of reliability and validity. It was demonstrated that over 30 blanks; the gain in reliability and validity was not statistically significant enough to increase the number of the deletions. Consequently, the third criterion for the standard cloze test was set that the number of deletions should be between 25 and 30. When the number of deletions was determined, the reasonable length of the passage would be easy to decide on. Assuming that there are 30 blanks in the cloze test with every 7th word deleted, the passage will be about 210 words long. Allowing 20 to 40 words for the first and the last sentences, which should be left intact, the reasonable length of the passage for a cloze test would be around 250 words. Thus, a standard cloze is a passage of appropriate difficulty in which every 7th word is deleted and the first and the last sentences are left intact. The development of the test necessitates scoring procedures after the test is administered. In the cloze test, too, a simple and objective scoring technique had to be developed. Therefore, to facilitate the scoring procedure, scholars considered each blank as an item and developed various scoring methods, among which the “exact word method” and the “acceptable word method” are very common and frequently used. In the “exact word method”, an item is given a point if and only if the originally deleted word is provided by the examinee. Although this method makes the examinee’s task quite difficult, it is often employed in non–native environments. In the acceptable word method, on the other hand, a supplied word will be considered correct, and thus given credit, if it is acceptable in the context of the passage. That is, if the supplied word makes the context meaningful, it will be considered as the correct response. Although research results indicate that there is no significant difference between the two methods of scoring, the “acceptable word method” has proven to be more suitable for the examinees. The major difficulty in this method, however, concerns the identification of acceptable words for a given blank. The most practical way to determine the acceptable words is to pretest the cloze test with a sufficient number of native or native–like speakers. However, native speakers may not be readily available in a non–native speaking environment. On the other hand, the “exact word method”, though difficult for the test takers, does not need the pre-testing of the test with native speakers. Therefore, in EFL situations, “the exact word method” is recommended.

32

Theories in language testing

Hossein Farhady

The cloze procedures explained here is referred to as the “standard cloze in open– ended” form. A different version of the standard cloze is in the multiple–choice format. In this type, four choices are provided for each blank and the examinee is required to choose the most appropriate word from among given alternatives. Like other recognition tests, the multiple–choice cloze assesses the examinees’ passive knowledge of the language, whereas the open–ended cloze measures examinees' productive linguistic abilities. It should be pointed out that the multiple-choice cloze tests are easier to take than the open–ended ones because the nature of production tasks requires higher level of competency than do recognition activities. Using either form, language testers are recommended to employ standard forms of the cloze as testing instruments. Other varieties of cloze procedure are considered as useful activities for instructional purposes. They should not, however, be used as testing devices. Some varieties of cloze procedure, referred to as “alternative cloze”, for classroom activities follow: 1. A sentence or a set of sentences each with a word deleted. 2. A short paragraph with the words of a certain grammatical class such as articles, prepositions, verb forms, etc, deleted. 3. A passage with certain deletions to be filled in with words from a list given to the student. In addition to these sample varieties, any cloze procedure that does not follow the principles of standard cloze would be considered as alternative cloze. Alternative cloze tests, as mentioned before, do not constitute reliable or valid measurement devices. Therefore, they should be used in informal situations but not as testing instruments. It should also be kept in mind that passages from different scientific areas such as humanities, engineering, medicine, etc., can be easily developed as cloze tests for the students in respective majors. In addition, the number of deletions, the number of words between the two deletions, the kinds of words to be deleted, and the difficulty level of the passage would give a great maneuverability to the teachers and educators in employing the cloze procedures. Therefore, cloze tests can serve as a versatile device for both instructional and evaluative purposes. Dictation The other integrative test, dictation, is one of the old instruments for measuring language ability. Unfortunately, however, it did not receive any serious attention until the late sixties because of two major reasons. First, early scholars claimed that dictation was not an economical test; in addition, the scoring procedure for the dictation was not objective. Second, some testers misused the dictation because they did not pay attention to the concept and purpose of dictation. They used dictation as a spelling test that was completely against the principles of integrative testing in general, and of dictation in particular. During the last two decades, fortunately, testing specialists observed the utility of dictation tests, and thus, dictation obtained its deserving position among other tests.

33

Theories in language testing

Hossein Farhady

Dictation became one of the most highly respected integrative measures of language ability. Research on dictation has also demonstrated high validity and reasonable reliability for such tests. As with cloze tests, certain criteria have been set for the so-called “standard dictation”. A standard dictation is a passage of appropriate length (usually 100-150 words) with reasonable difficulty (determined by readability scales) read three times in the following manner. In the first reading, the passage is read, preferably on tape, at the normal rate of speech. In this stage, the examinees only listen in order to get the general idea of the passage. They are not allowed to write anything down at this step. In the second reading, the passage is read at the normal rate of speech, and with sufficient pauses at appropriate points with punctuation marks supplied. During the pauses, the examinees are required to write down the chunks of the language they hear. The length of time for each pause should be determined in advance and in accordance with the number of words within the chunk to be written down. The following example demonstrates the places where pauses should be exercised. It is often observed / that university students / have more problems / than those in high school.

It should be clear that the pause should be given at the point on which the natural reading process requires it. In the third reading, the passage is read as it were in the first reading. The purpose of this last reading is to give the examinees a chance to correct the words or to write down the words they might have missed in the previous readings. After administering the dictation, it should be objectively scored. In scoring dictation tests, the following points should be taken into account: 1. Every word is considered as an item. 2. Every morphologically correct word is given a point. 3. Spelling does not count as long as the meaning of the word is preserved. That is, if a word such as “ship” is written as “sheep,” it would be considered wrong because the meaning of the word is changed. However, if a word such as “beautiful” is written as “beautifull,” it would be considered correct and thus given a point because the spelling error does not change the meaning of the word. It should be emphasized that ignoring some of the spelling errors in dictation does not imply, by any means, that spelling is not important in language teaching or in language testing. Spelling requires a long time to be mastered through tedious work on the parts of both the teacher and the student. The main point, however, is that using dictation for spelling purposes is unacceptable because it would serve neither the purpose of dictation nor that of spelling. Therefore, these two tests, dictation and spelling, should be kept quite separate from one another and used appropriately. Cloze and dictation type tests have been used widely as integrative measures of language proficiency. It is quite possible, however, that some teachers may not be

34

Theories in language testing

Hossein Farhady

familiar with such tests. However, it is recommended that teachers familiarize themselves with these new tests and utilize such techniques in the classroom. This would serve two purposes. First, students will become familiar with these new types of language tasks; and second, students will benefit from instructional values of these activities. These procedures can be used as effective exercises to teach language in its natural form without decomposing it into discrete items. Of course, these developments do not end improvements in language testing because languageteaching theories continue to change. Consequently, testing procedures are modified. The latest modifications in testing and teaching theories are presented below.

Notional Functional Approach Modifications in language teaching led to the development of a new approach referred to as the notional-functional approach (NFA). Although the basic linguistic and psychological principles of the cognitive-code learning theory are maintained, they differ in two major respects in the NFA. First, in the NFA, a great attention is paid to the social appropriateness of sentences and utterances. That is, grammatical correctness of a sentence is necessary but not sufficient for that sentence to be used in a communicative setting. Accordingly, an utterance must be both linguistically accurate and socially appropriate. Second, the NFA considers language as communicative chunks called “functions”. Functions refer to what people do with language in real communication settings. For example, people use language for the purposes of seeking information, apologizing, persuading others, and so forth. These functions are carried out utilizing linguistic elements that are called “notions”. Thus, the NFA assumes that language consists of certain functions to be fulfilled through certain linguistic structures, i.e., notions. This new approach in language teaching necessitated a new approach in language testing. The method developed to follow the principles of the NFA is referred to as functional testing of which the main objective is to assess learners’ ability in carrying out language functions. In order to develop a functional test item, a multiple-choice form for example, a real language context, based on a certain function, should be constructed as the stem. Then, the alternatives should be developed through certain pre-testing steps. In order to explain the characteristics of a functional test item and the steps followed in developing it, an example will be helpful. Consider the following item: You are applying to a university and need a letter of recommendation. You go to a professor, who is also your friend, and say: a. I’d appreciate it if you could write a letter of recommendation for me. b. I want you to write a letter of recommendation for me. c. I wonder if you can write letter of recommendation for me. d. Hey, give me recommendation letter.

This functional item has the following unique characteristics that no other test item possesses. These characteristics are: 1. The function to be fulfilled in this item is “getting things done,” and in this particular case, “requesting someone to do something.” The stem clearly

35

Theories in language testing

Hossein Farhady

demonstrates the function that someone, i.e., the student, wants someone else, i.e., the professor, to do something, i.e., write a letter of recommendation. 2. The social setting or the communicative context in which the function is to be fulfilled is an academic environment because it is to take place in a university. 3. The social relationship between the people involved in communication is friendly as stated in the stem. 4. The social status of these participants is unequal because one of them is a student and the other is a professor. 5. The first alternative, which is linguistically accurate and socially appropriate, is based on the performance of native speakers of English, i.e., the most frequent statement produced by native speakers in this particular situation. 6. The second alternative, which is linguistically accurate but socially inappropriate, is based on the performance of non – native speakers who have not lived in the English speaking communities, but have received a lot of formal instruction. 7. The third alternative, which is linguistically incorrect but socially appropriate, is based on the performance of nonnative speakers who have lived in the English speaking communities for a long time, but have not mastered the linguistic rules of the language. 8. The last alternative, which is the only distractor, is neither linguistically accurate nor socially inappropriate. 9. These characteristics enable functional tests to measure communicative abilities as well as linguistic abilities of the examinees. Moreover, these tests are more suitable for measuring special functions in different fields of ESP.

Summary and Conclusion In this section, theories of language testing were discussed. It was mentioned that testing theories follow teaching methodologies. The following table illustrates this correspondence. Teaching Method Grammar Translation Audio-Lingual Approach Cognitive-Code Learning Notional-Functional Approach

Testing Method Translation; Essay Type Items Discrete-point Items Integrative Test Items Functional Test Items

The close relationship between teaching and testing methods dictates some pedagogical implications. That is, a particular teaching method, utilized in an educational setting, requires teaching materials to be developed on the basis of the

36

Theories in language testing

Hossein Farhady

principles of that method. More importantly, the testing approach should be in harmony with the teaching method employed. Consider an educational setting in which instructional materials are prepared on the basis of the cognitive–code theory, taught on the basis of grammar translation method, and tested through discrete-point items. Such an unfortunate situation would most probably lead the educational program to fail, because there is no relationship between the method of teaching, the procedures for materials development, and the testing method used. In order to achieve instructional objectives, there should be a close relationship between these objectives and the instructional materials. Furthermore, the materials should be taught through methods corresponding to the principles upon which the materials are prepared. Finally, the achievement of the students should be measured through utilizing testing methods that correspond to the principles of the teaching method employed. Otherwise, the instructional program is not likely to succeed in achieving the preplanned objectives.

* This is the revised version of the paper printed in Roshd Foreign Language Teaching Journal (1986). 2 (4). Tehran, Iran.

37

Theories*

language element, i.e., grammar, vocabulary, pronunciation, etc., should be tested ... assessing the reading comprehension ability of native speakers of English.

56KB Sizes 0 Downloads 240 Views

Recommend Documents

Implicit Theories 1 Running Head: IMPLICIT THEORIES ...
self, such as one's own intelligence (Hong, Chiu, Dweck, Lin, & Wan, 1999) and abilities (Butler,. 2000), to those more external and beyond the self, such as other people's .... the self. The scale had a mean of 4.97 (SD = 0.87) and an internal relia

Many Theories
around the globe become enmeshed in a web oí economic and social connections, the costs of .... actions did not conform to U.S. interests. ít refused to join the rest of the world in outlawing the .... The new meeting place of the foreign policy ..

gauge theories
Apr 3, 2003 - between them looks virtual, as the two spins are geometrically disconnected. ... In [9] one examined theories with centre ... percolation possible at the critical temperature Tc (we call such probability pCK after Coniglio.

[RAED] PDF Criminology: Theories
[RAED] PDF Criminology: Theories

[FREE] PDF Criminology: Theories
[FREE] PDF Criminology: Theories

Theories
Iran University of Science and Technology. Introduction. In the previous sections, some fundamental concepts in language testing including function, form, and characteristics of language tests were discussed. In this section, another fundamental conc

Perception. Motor theories
Each time it is engaged in an action, the brain constructs hypotheses about the state of a variegated group of sensory captors throughout the movement; the ...

Theories-Joint-Distribution.pdf
pX(x) = X. i. (x, yi),. which is the marginal pmf of X. For the above example, we have. pX(0) = pX(1) = 0.5. 1.2 Jointly Continuous Random Variables. Joint PDF ...

THEORIES AND REALITY
A closer look at the published articles defending each of those views reveals .... statement can be reduced to statements containing only terms for sense-data and logical terms. ... Something similar to what we see within Quine's system can also be s

Abelian Gauge Theories
Abstract: This is a cheat sheet for D = 2, N = (2,2) supersymmetric abelian gauge theories. We explicitly work out the components of chiral, twisted chiral and vector super- fields and construct terms of supersymmetric gauge theory Lagrangians using

9/11 Conspiracy Theories - Factinista
The Collapse of World Trade Center Buildings 1 and 2. When mast of us .... people, any grayish metal looks sufficiently like steel to call it "steel" when speaking ...

Theories of Machine - SS Rattan.pdf
Page 1. Whoops! There was a problem loading more pages. Retrying... Theories of Machine - SS Rattan.pdf. Theories of Machine - SS Rattan.pdf. Open. Extract.

PLS421-521Syllabus - Theories of Political Obligation.pdf ...
Page 1 of 9. 1. PLS 421/ 521: THEORIES OF POLITICAL OBLIGATION (Lunch Seminar). Instructor: Dr. Shu-Shan Lee. Meeting Time & Room: Every Friday, ...