head
left
 
ISSN: 1738-1460
Homeome
Commercial
Conferences
Contact
Editorial Board
Hard Cover
International
Introduction
Privacy Policy
Related Links
Rod Ellis Award
Search
Site Map
Special Editions
Submissions
J


| June 2006 home | PDF Full Journal |

Volume 8. Issue 2
Article 2


Article Title
An investigation into the task features affecting EFL listening comprehension test performance

Author
Hu Ying-hui

Biography:
Hu Ying-hui is a postgraduate student at Shanghai Jiaotong University, majoring in English language testing. She is also an experienced university lecturer in college English classes in China. Her interests cover teaching and testing of English listening and speaking, language learning strategies, and classroom evaluation. Several papers on these topics have been published in Chinese publications.


Abstract

The construct validation of a multiple-choice listening test requires some evidence that text and text associated variables play a significant role in predicting item difficulty. The purpose of this study is to investigate the effects of task features on test performance in EFL listening tests by determining how well item difficulty can be accounted for by text factors, item factors and text-item factors. A sample of 159 items of CET listening tests was analyzed, based on which a summary of task features of CET listening passages is presented. Furthermore, the results of correlation and regression analyses indicate that text-by-item interaction variables contribute significantly to item difficulty, thereby providing evidence favoring the construct validity of CET listening tests. Two best predictors of item difficulty are the redundancy of necessary information, and lexical overlap between words in the text and words in an item's options.

Key words: test task, construct validity, CET, EFL listening tests

Introduction
In the field of language testing, there is a steadily growing interest in the identification and characterization of those factors which affect the test performance of the language learner with the objective of achieving more informed construct validation results (Bachman 1990; Foster & Skehan 1996; McNamara 1996). Bachman (2002: 471) points out that we should clearly distinguish among three sets of factors that can affect test performance:

1) Characteristics inherent in the task itself
2) Attributes of test takers
3) Interactions between test takers and task characteristics

Language test performance can be attributed to test task features. Their effects may reduce the effect on test performance of the language abilities we want to measure, and hence the interpretability of test scores" (ibid.). It is, therefore, vitally important for language testing researchers to determine what the nature of the relationship between test tasks and test performance is, and how it affects the interpretation of test results. The information can be used as the basis for the improvement of test reliability and validity, and more specifically, for the design of tests for particular populations.

It is out of these considerations that an in-depth analysis is intended in the study to explore the relationship between major test task features and students' test performance in EFL listening tests. The decision on EFL listening tests as the focus of the study is of particular significance in the context of China's college English teaching. Developing students' ability to use English as a tool of communication, especially their listening and speaking abilities, is clearly specified as the objective of college English teaching in China.

The main purpose of the present study focuses upon the construct validity of multiple-choice listening comprehension tests. To be valid, a multiple-choice test of listening must demonstrate sensitivity to the information in the text passages. One serious criticism regarding construct validity of listening tests maintains that examinees do not or need not have to listen to the passage in order to answer the items. Freedle and Kostin (1999) point out that one could counteract such criticisms by showing that some variables that reflect the structure and content of the text passage are significantly correlated with item difficulty. Finding such significant correlations would indicate that examinees are probably paying attention to text information and are using this information to guide their selection of answers to the items. Particularly, they suggest that the lowest level of validity of multiple-choice test requires finding some significant support for the effect of text variables on test item difficulty. Therefore, a related purpose of the study is to examine whether text and text associated variables play a significant role in predicting item difficulty.

The following two research questions are addressed:
1) What are the major task features of EFL listening tests?
2) How can task features affect performance in EFL listening tests?

Review of related studies
Task features can be further categorized into those related to task input (or text) and those to test item. A review of studies examining task features and test performance suggests that variations in the specific characteristics of task input and test item affect difficulty of items. In listening comprehension we could find only several empirical studies in which a number of factors that may affect listening task difficulty are examined and identified. Freedle and Kostin (1996) examined 337 TOEFL items, which asked a small number of multiple-choice comprehension questions on short-spoken passages. They found that a different set of attributes worked better for each item type. For example, in the case of items that asked for the identification of the main idea, three attributes identified were lexical overlap, rhetorical structure of the passage, and topic.

Nissan et al. (1996) analyzed TOEFL dialogue items and found five significant variables relating to listening performance. The three best predictor variables were (a) the presence of two or more negatives in the dialogues, (b) the need to draw an inference beyond what is explicitly stated in the dialogue, and (c) the pattern of utterances in the dialogue.

Brindley and Slatyer (2002) reported on an exploratory study that examined the effects of task characteristics and task conditions on learner's performance in competency-based listening assessment tasks. Key variables investigated included the nature of the input and the response mode, namely speech rate, text type, number of hearings, input source (live vs. audio-recorded) and item format. Quantitative and qualitative analyses of test scores indicated that speech rate and item format could affect task and item difficulty.
Kostin (2004) explored the relationship between a set of item characteristics and the difficulty of TOEFL dialogue item. This study has replicated some of the significant findings in Nissan et al. (1996). In particular, it has found that the lexical overlap between words in the text and words in an item's options affect listening item difficulty.

Buck and Tatsuoka (1998) were concerned with identifying cognitive abilities needed to perform short-answer comprehension questions. Three structural components of the listening test tasks have been singled out as influencing item difficulty.

1) The necessary information (NI): This refers to "information in the text which the listener must understand to be certain of the correct answer" (Buck & Tatsuoka 1998: 134). The location of the NI and its linguistic characteristics are found to be key factors affecting item difficulty and candidate responses.

2) The surrounding text: This refers to the text immediately surrounding the necessary information. The characteristics of this part of the text are found to have a greater effect on item difficulty than the characteristics of the whole text.

3) The stem: This is defined as the written text on the answer sheet which test takers have in front of them as they listen and which serves both as a listening guide and a structure for the written response. In response-constructed tasks, the stem would be the beginning of the short answer question (SAQ) to be answered.

The present study builds on these findings and explores their applicability in EFL listening comprehension test in China. On the basis of the literature review, a framework of variables assessing test task features was presented which embraced four groups of variables: text variables, item variables, text/item variables, and item type.

Text variables characterize the content and structure of the listening passage itself and these variables can be further classified in terms of word-level, sentence-level, and discourse-level factors. These variables are related to the linguistic characteristics which have been traditionally associated with comprehension difficulty. Item variables constitute the so-called "pure" item variables which can be coded without reference to the contents of the listening passage. Only the contents of the item itself are used to quantify these particular variables. Three types of item were studied (Freedle and Kostin 1999): detailed explicit, detail implicit and main idea items. Text-by-item or alternatively text/item overlap variables are defined as variables that necessarily reflect the contents of both the test items as well as the text to which those items apply. These factors typically involve an interaction between features of the text and features of the item. Item types are a special type of text/item overlap and they refer to the response expected from the test taker to the task. In general, there are two types of response: selected and constructed (Bachman 1990: 129).

Materials and method
1. Item sample
The objective of the analysis was to investigate whether two factors of listening tasks-text and test items-exercise a systematic influence on item difficulty. Items were coded on these factors believed to affect performance-vocabulary frequency, syntactic complexity, topic, etc-and then the item score on these factors was used to predict item difficulty.

The 159 listening comprehension items taken from 16 disclosed post-1992 CET Band-4 forms comprise the total item sample. The National College English Test of China (CET) is a national standardized test of English proficiency administered to Chinese college students. Listening comprehension is the first part in the CET. Students should be able to get the gist of the discourse, understand the main points and important details, and recognize the opinion and attitude of the speaker. The listening sub-test has two sections and lasts 20 minutes. Section A contains ten short conversations and Section B contains three passages.

After each passage, there are three or four questions about it. Each recording is played once only. The passages in Section B are stories, talks, etc on personal life, social and cultural issues, and popular science. Item type includes multiple-choice questions and compound dictation. A more detailed description of the current listening comprehension sub-test is presented in Appendix A. In this study, the correct option will be referred to as the key, and the incorrect options will be referred to as the distracters.

The item sample included 19 inference and 140 explicit questions. As each test form contains three passages and 10 items, there should be 48 passages and 160 items. However, one item was deleted since it is a true-or-false question and does not fit the two question types under investigation. The original data on item difficulty for the 159 items were collected from three different test centers in China and involved approximately 1000 college students learning English as a foreign language. These students were randomly selected from a much larger pool of test takers who responded to each College English Test (CET) Band-4 test form.

2. Study variables
The content and structure of the items and their associated text passages were represented by a set of predictor variables that included a wide variety of text and item characteristics identified from the experimental language-comprehension literature. Given the practical difficulties involved in investigating the effects of all of these variables simultaneously, it was decided to narrow the range of investigation to 24 key variables that seemed most relevant in the context of EFL listening tests under investigation. At the same time, from a theoretical perspective the study presented an opportunity to investigate some of the hypotheses that have been advanced in the research literature concerning those variables that affect second and foreign language listening comprehension.

Below is a summary of the 23 coded variables for initial investigation. Not all variables were used in the analyses. Because of low frequencies of occurrence, defined as two or fewer occurrences in the N = 159 sample, the variables V02, V03 and V13 were deleted. Thus a total of 19 variables were coded, including 10 text variables, 2 item variables, and 7 text/item variables.

Text variables
Word-level variables

V01: number of words with more than two syllables among first 100 words
V02ª: presence of an infrequent word which is relevant to responding correctly
An infrequent word refers to a word not in The Most Common 100, 000 Words Used in Conversations (Berger, K. 1977).
V03ª: presence of an idiom which is relevant to responding correctly
An idiom is defined as an expression consisting of two or more words having a meaning that cannot be deduced from the meanings of its constituent parts in the American Heritage Dictionary (2000).

Sentence-level variables
V04: average number of words of text's sentence
V05: number of dependent clauses in text
V06: number of words in the longest T-unit
A T-unit is defined as an independent clause with any attached dependent clauses (Hatch & Lazaraton 1994).

Discourse-level variables
V07: number of negations in text
Negative markers (e.g., no and not) are counted, as well as negative prefixes (e.g., un- and dis-). Negative tags are also counted, even if their meaning is not negative.
V08: number of interrogative sentences
V09: number of references
V10: coherence (1 = min coherence; 3 = max coherence).
High coherence means elements of opening text sentence densely represented throughout text, etc.
V11: position of main idea in text
(0 = main idea implicit; 1 = in last text sentence; 2 = in middle of text; 3 = among first three sentences)
V12: rhetorical organization
(description, causation, comparison)
V13ª: topic of text (0 = non-academic topic; 1 = academic topic)

Item variables

V14: explicit (e.g., What is the boiling point of lead?)
V15: inference (e.g., According to the passage, one can infer…)

Text/Item variables
V16: position of necessary information
(1 = among the last three sentences; 2 = in middle of text; 3 = among first three sentences)
Necessary information (NI) refers to "information in the text which the listener must understand to be certain of the correct answer" (Buck & Tatsuoka 1998: 134)
V17: indication of necessary information (explicit indication that NI is coming next)
V18: redundancy of necessary information (all, or part of NI is repeated in text)
V19: number of words in the key
V20: lexical overlap in the key (key have more words than distracters overlap with words in text)
V21: lexical overlap in distracters (distracters have more words than key overlap with words in text)
V22: use of background knowledge to infer the answer

Dependent variable
V23: item difficulty (equated delta, a standardized measure of difficulty)
Finally, it should be noted that this study did not examine phonological features of test tasks, although previous studies have demonstrated effects of acoustic input on listening comprehension (e.g., Yong Zhao 1997). The reason is that phonological factors including accent, speech rate and sandhi are under strict control in test design of CET listening.

3. Procedure
The first data analysis task involved coding each of the 48 passages for the use of task input features. The analysis was based on the coding of the researcher. A second coder was recruited to establish inter-coder reliability for those variables requiring subjective judgment. The correlation coefficient between the two coders on a sample of 12 passages and 40 items is .86. The high inter-coder reliability ensures the use of one researcher for the rest of the coding.

As preliminary procedures, descriptive statistics were first generated from the data for the purpose of indicating that the central tendency and the dispersion were generally in normal distribution way in order to ensure that the subsequent statistics are valid for the research questions.

A series of ANOVAs was conducted with text organization as the grouping factor. It was expected to discover whether passages of different text organizations may vary in text features. Afterwards, correlations between three sets of task factors (i.e., text variables, item variables, and item/text variables) and item difficulty were computed. Multiple regressions were subsequently used to identify the best predictors of item difficulty from the four sets of variables considered together. It was expected to identify the variables predictive to item difficulty, or more specifically, to explore specific task features associated with certain level of item difficulty.

Results and discussion
1. Overall results of text materials
In response to the first research question "what are the major task input features of EFL listening tests", CET listening passages were analyzed in terms of text variables which characterize the content and structure of the passage itself. The results obtained help us to make a summary of task input features of listening comprehension passages (see Appendix 1). Among the 48 passages, the plurality of text organization comes from description, followed by argumentation and comparison. Listening passages show no significant difference in a number of text features, including text length, vocabulary frequency, syntactic structure, and content. Moreover, most passages are highly coherent and the main idea is explicitly stated among the first three text sentences.

Meyer's (1985) framework of rhetorical organization was modified to define passage groups in the study. During the coding procedure, it was recognized that there is a certain amount of overlap in the text organization. For example, the problem-solution might contain elements of causation, whereas the listing structure might contain elements of both. In addition, since too many text types would complicate the research design, it was therefore decided to adopt only three types of rhetorical organization: description, comparison and causation.

The variables of coherence and text organization present a highly centralized distribution around the median, suggesting the consistency of text type used. ANOVA results indicate number of dependent clauses is a significant factor among rhetorically different texts (see Appendix 2). The causation text contains significantly more dependent clauses. It is also worth noting that significant differences exist in number of negations between texts of causation and comparison.

Another interesting finding involves the topic of passages. The variable V13 was developed to reflect academic vs. nonacademic topics. Differential familiarity with different topics covered by listening passages may play a role in accounting for listening performance. It seems likely that items that inquire about the nonacademic topics may, because of their greater general familiarity, be easier than items about academic topics. However, only three passages involve academic topic in passage sample, suggesting that CET listening passages are not field-specific. Thus the construct-irrelevant variance in topical familiarity can be minimized, and the content validity of the test can be ensured.

In summary, the findings concerning text variables can provide clear evidence for the construct validation of CET-4 listening tests. Validity centers on the extent to which inferences and interpretations from test scores are supported by the evidence available, what the assessment instrument measures. Bachman (1996) describes validation as a general process that consists of the gathering of evidence to support a given interpretation or use, a process that is based on logical, empirical and ethical considerations. Thus validation should ensure that the differences in test performance of different test taker groups are related primarily to the abilities that are being assessed and not to construct-irrelevant factors.

Construct-irrelevant factors in terms of content bias include topical knowledge and technical terminology, specific cultural content and dialect variations. Format bias could include multiple-choice, constructed response, computer-based responses, and multi-media materials. Other key construct-irrelevant factors include insensitive or offensive test materials and materials that stereotype and show certain test taker groups in unfavorable light (Kunnan 2000: 3). Our results demonstrate that construct-irrelevant factors in terms of test materials are not related to performance in the context of CET-4 listening tests.

2. Correlations between task variables and item difficulty

Table 1 presents those variables that are correlationally significant in predicting item difficulty. Of the 19 variables examined, four variables yielded a significant correlation (p < .05) with item difficulty (equated delta).

V10: coherence of text
V15: inferencing
V18: redundancy of necessary information
V20: lexical overlap in the key
V21: lexical overlap in distracters

As expected, other task features (e.g., linguistic and discourse features of passages) did not significantly contribute to the listening item model. Overall, the correlation results suggest that many of those variables found to influence comprehension in the experimental language comprehension literature also influence our multiple-choice listening data.
Table 1


Please click here to see table 1 in MS Word format


The first variable whose p value is less than the critical probability is V10 (coherence of text); the correlation (r = .195*, N = 159) that text with high coherence was associated with easier listening items, as expected. Coherence is characterized as the degree of unity, or how well a text holds together. A well-organized text would be better recalled, and a tight top-level rhetorical organization would enhance comprehension because the ideas in the text are closely interlinked (Meyer & Freedle 1984; Meyer et al. 1993).

The variable V15 (inferencing) is significantly correlated with item difficulty (r = - .219**, N = 159), indicating that items are more difficult when an inference is required to respond correctly. This result was expected in that making inferences is more cognitively demanding, and consequently, may impede listening comprehension performance.

With regard to this result, the question arises whether question type might threaten test validity. If scores on a listening comprehension test reflect only language comprehension, item scores should be predictable only from linguistic features of the items and from the language comprehension skills of the students. Other item features, such as question type of items, are not supposed-or even allowed-to influence the performance of students. However, only 19 of the 159 items in this study, about 12% of the items, were coded for this variable. This renders it impossible to draw conclusions about the effect of question type on item difficulty.

The third variable meeting the critical probability criterion is V18 (redundancy of necessary information). V18 correlates positively with item difficulty (r = .388**, N = 159). When the necessary information was repeated, items were easier. This is consistent with previous studies. Necessary information refers to information in the text that listeners must understand to respond correctly. Its location and linguistic characteristics are found to be key factors affecting item difficulty and candidate responses (e.g., Buck & Tatsuoka 1998).

In addition, some researchers maintain that redundancy has a significant effect on listening-item comprehension. For example, Chiang and Dunkel (1992) found that redundancy does play a significant role in comprehension; Parker and Chaudron (1987) found that repetition of the information plus clear segmenting of the thematic structure enhanced orally comprehension. Therefore, the repetition of necessary information is undoubtedly associated with easier items.

There are substantial lexical overlap effects operating in listening. The two lexical overlap variables (V20, V21) yielded significant coefficients for prediction of item difficulty. Lexical overlap between words in the key and words in the relevant text sentence was significant for listening passages (r = -.356**, N = 159). A significant and fairly strong positive correlation exists between lexical overlap between words in the distracter and words in the relevant text sentence and item difficulty. (r = .404**, N = 159).

The variable V20 (lexical overlap in the key) was negatively related to item difficulty, indicating that items with a high percentage of lexical overlap in the key tend to be easier items. Similar findings in regard to percentage of lexical overlap in the key have been reported for TOEFL mini-talks (Freedle & Kostin 1999) and for TOEFL reading (Freedle & Kostin 1993). One might be concerned that a test taker having little or no comprehension of a passage could nevertheless perform well on CET items by simply choosing the option that had the most lexical overlap with the passage. Some information relevant to this concern is provided by results regarding V20. Only 36 of the 159 items in this study, about 23% of the items, were coded for this variable. Thus, using a strategy of selecting the option with the most lexical overlap would certainly fail to yield a good score on this item type.

The findings also suggest that item difficulty is also related to lexical overlap between words in the distracters and words in the passage. The correlation for V21 (lexical overlap in distracters) indicates that items tend to be easier when no distracter has more words that overlap with the passage than does the key. This suggests that if distracters had more lexical overlap with the passage as compared to the key, the item would be harder. Items tend to be harder when all three distracters have more words overlapping with the passage than does the key.

The direction with which these four variables correlated with item difficulty is consistent with the findings in the research literature. This provides evidence to suggest that the results regarding some of these variables will be successfully replicated.

3. Regression analyses
In response to the second research question "how do task features affect performance in EFL listening tests", regression analyses were performed with item difficulty as the dependent variable.

Linear regression is employed to model the value of dependent variable (item difficulty) based on its linear relationship to predictors (V01, V04, V05, V06, V07, V08, V09 and V19). As is shown in Table 2, the small value of R squared indicates that the model does not fit the data well. Only 4.7% of variation in the dependent variable could be explained by the regression model. As expected, average sentence length and syntactic complexity effects were not significant for listening items. ANOVA summarizes the results of variance analysis. The significance value of F is larger than 0.05, indicating that these text variables on word and sentence levels can not explain the variation in item difficulty.

Table 2

The categorical nature of the variables V10, V11, V12, V15, V16, V17, V18, V20, V21, and V22 and the nonlinear relationship between these task input variables and item difficulty suggest that nonlinear regression may perform better than standard regression. When all the independent variables were entered as a block, the fit of the model was very strong. Measures of the model fit are displayed in Table 3. The overall F = 6.302, p <.001; the multiple-R = .613, the R-squared = .316, which accounts for 31.6% of the item difficulty variance. The significance value of the F statistics means that the variation explained by the model is not due to chance. Apparently, the independent variables do a good job explaining the variation in the dependent variable. The multiple-R shows the overall correlation between predictors and the dependent variable is fairly strong.

Table 3

The statistical results reported here clearly demonstrate that task input and test item both contribute to the prediction of item difficulty. The regression procedure yielded three significant predictors of item difficulty:

V18: redundancy of necessary information
V20: lexical overlap in the key
V21: lexical overlap in distracters

Lexical overlap in distracters is the best predictor of item difficulty (= 0.32). The second best predictor is redundancy of necessary information (= -0.25), followed by lexical overlap in the key ( = -0.23). The direction with which these variables correlated with item difficulty is consistent with the previous findings. It should be noted that although the standardized coefficients were statistically significant, they were quite small in value, ranging form 0.23 to 0.32. . In general, it seems fair to say that the findings from this study are to a certain degree consistent with Freedle and Kostin's (1996) assertion that lexical overlap and necessary information can be singled out as influencing item difficulty.

Although, the simple correlation between coherence of text (a text variable) and item difficulty is significant, regression analyses indicate that coherence does not contribute significantly to the prediction of item difficulty. This can be understood since the centralized distribution of the variable may counteract its effect on item difficulty.
It is also worth noting that pure item variable like question type appears to play a weak role in influencing item difficulty, while text and text associated (text/item overlap) variables play by far the major role in accounting for passage item difficulty. We are led to conclude that there is modest evidence to support the claim that the CET listening passages and their associated items appear to be valid in construct.

Conclusion
1. Limitations of the study
There are some serious limitations to the design of the present study. First item difficulty is not the dependent variable of theoretical interest. We are generally far more interested in understanding person performance ability than item difficulty (Buck & Tatsuoka 1998: 126). Our regression analysis puts the emphasis on item characteristics rather than performance ability. Another drawback with the use of regression is that it only provides information about group performance; it cannot tell us what factors specific test takers have mastered. Finally it is appropriate to note that the variables measured in this study are far from being exhaustive or comprehensive. These variables simply come from a survey of the research literature. Clearly these findings are compelling and merit further investigation.

2. Summary of major findings
In this study we have been interested primarily in determining how well the difficulty of listening items can be accounted for by a set of task features which involve text factors, item factors and text-item factors. The results concerning task input variables provide clear evidence for the construct validation of CET-4 listening tests. Listening passages used in CET Band-4 have no significant variance in linguistic characteristics such as vocabulary frequency, syntactic complexity, and content. Particularly, construct-irrelevant factors such as topical familiarity and dialect variations are minimized in the test materials, suggesting that test takers' performance on the test is primarily related to the abilities that are being measured.

More importantly, the empirical results demonstrate the effect of text variables on difficulty of test items and thereby provide evidence of test validity of CET. Two text associated factors (text-item factors) are directly tied to item difficulty in EFL passage listening:

1) Necessary information refers to the information in the text which the listener must understand to be certain of the correct answer, and its redundancy clearly contributes to item difficulty.

2) Lexical overlap between words in the text and words in an item's options may impact listening item difficulty. Easier items are characterized by a greater amount of lexical overlap between words in the text and words in the correct option. In contrast, if there is a greater degree of lexical overlap between words in the text and words in the incorrect options as compared to the correct option, the item tends to be more difficult.
These findings will, hopefully, inform language test developers and researchers regarding the task features that may influence listening test performance, and therefore, about the construct validation of listening tests. Our results provide clear evidence that examinees do attend to the text passages in answering the test items.

References
Bachman, L.F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.

Bachman, L.F. (2000). Modern language testing at the turn of the century: assuring that what we count counts. Language Testing, 17(1), 1-42.

Bachman, L.F. (2002). Some reflections on task-based language performance assessment. Language Testing, 19(4), 453-476.

Bejar, I., Douglas, D., Nissan, S. & Turner, J. (2000). TOEFL 2000 listening framework: A Working Paper. (TOEFL Monograph Series MS-19). Princeton, NJ: Educational Testing Service.

Brindley, G. & Slatyer, H. (2002). Exploring task difficulty in ESL listening assessment. Language Testing 19(4), 369-394.

Buck, G. & Tatsuoka, K. (1998). Application of the rule-space procedure to language testing: examining attributes of a free response listening test. Language Testing 15(2), 119-157.

Buck, G. (2001). Assessing listening. Cambridge: Cambridge University Press.

Foster, P. & Skehan, P. (1996). The influence of planning on performance in task-based learning. Studies in Second Language Acquisition 18, 299-324.

Freedle, R. & Kostin, I. (1993). The prediction of TOEFL reading item difficulty: implications for construct validity. Language Testing 10(2), 133-167.

Freedle, R. & Kostin, I. (1996). The prediction of TOEFL listening comprehension item difficulty for minitalk passages: Implications for construct validity. (TOEFL Research Report RR-96-29). Princeton, NJ: Educational Testing Service.

Kostin, I. (2004). Exploring item characteristics that are related to the difficulty of TOEFL dialogue items. (TOEFL Research Report RR-04-11). Princeton, NJ: Educational Testing Service.

Kunnan, A.J. (2000). Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida. Cambridge: Cambridge University Press.

McNamara, T.F. (1996). Measuring second language performance. London: Longman.

Meyer, B.J.F. & Freedle, R.O. (1984). Effects of discourse type on recall. American Educational Research Journal 21, 121-143.

Nissan, S., DeVincenzi, F., & Tang, K. L. (1996). An analysis of factors affecting the difficulty of dialogue items in TOEFL listening comprehension. (TOEFL Research Report RR-95-37). Princeton, NJ: Educational Testing Service.

Appendix 1 and 2. See MS Word document or PDF file

right
 
Articles-Teaching
2008 Journals
2007 Journals
2006 Journals
2005 Journals
2004 Journals
2003 Journals
2002 Journals
Academic Citation
Author Index
Blog pages
Book Reviews
For Libraries
Indexes
Institution Index
Interviews
Journal E-books
Key Word Index
Subject Index
Teaching Articles
Thesis
Top 20 articles
Video
T
Announcements
Conference Material
I-TAA
Journals in Group
R & D in EFL
TESOL Asia
TESOL Certificate

 

foot
xx
Part of the Time-Taylor Network
From a knowledge and respect of the past moving towards the English international language future.

Copyright © 1999-2008 Asian EFL Journal ..........Contact Us .............last updated 20th/July/2008