Abstract
Corpus linguistics is developing at such an amazing rate that established corpora of different genres and for different purposes are emerging rapidly in recent years. However, though the advantages of all these corpora to language teaching and learning are well acknowledged, they haven’t produced “tangible pedagogical results” (Nunn, 2005) in an EFL classroom context. With a brief review on the evolution of EFL teaching methods and a short introduction to the established general and learner corpora, this paper analyzes the main reasons why there is a gap and a lag between on-going corpus linguistic research and EFL teaching and learning, and concludes that it is necessary and feasible for EFL teachers, focusing on some basic principles, to build a learner-oriented mini-corpus to complement the existing shortcomings of the established corpora in EFL teaching. In addition, this paper also points out that an EFL teacher should endeavor to use various teaching methods or measures to meet EFL learners’ diverse needs, including the use of corpora, either the self-built or the established ones or their collaborations.
Key words: self-built mini-corpus, established general and learner corpora, EFL teaching and learning
Introduction
English has been undoubtedly established its status as an international language, regardless of people’s likes or dislikes. Smith (2007) even states that with English gaining status as the primary global language in almost every trade and profession, literacy now often includes and assumes the need for competence in English. Whether she is right at this point is not important. What matters is that English currently enters classrooms in nearly every corner of the world, and a lot of EFL teaching methods have been explored and investigated to arm EFL teachers. With the development and application of computer techniques, CAI (Computer Assisted Instructions) and CMI (Computer Management Instructions), and the Internet in the digital era have provided brand-new teaching methods (Gao, 2005) which include the use of linguistic corpora in the field of EFL teaching and learning.
McEnery & Wilson (2001) state that, “From being a marginalized approach used largely in English linguistics, and more specifically in studies of English grammar, corpus linguistics has started to widen its scope” (p.1). In recent years, linguistic corpora of different genres and for different purposes have been growing like mushrooms, whose applications touch nearly every aspect of language, including EFL teaching and learning. However, in EFL countries like China, there is a widening gap and a growing lag between on-going and intensive corpus-linguistic research on the one hand and classroom teaching on the other. Granger (2004) reports, “research into the use of corpora for language teaching is almost entirely done by linguists; the contribution of SLA researchers to – and the participation of EFL teachers in – what happens in corpus linguistics is still relatively low” (p.136). Even if there are a few EFL teachers such as Yang & Liao (2004) and Tian (2004) who tried out corpora in their classrooms, most of them still relied on established corpora like BNC (British National Corpus), LOB (Lancaster-Oslo/Bergen Corpus), LLC (Longman Learner Corpus), ICLE (International Corpus of Learner English) and etc. A much more disappointing fact is that even fewer EFL teachers consider using those established corpora combined with one or more mini-corpora built by themselves in their classroom teaching, though it is not an unattainable goal at all. Focusing upon the above-mentioned phenomena, this paper first brings a brief review on the evolution of EFL teaching methods and the development of linguistic corpora, esp. ICLE, the representative of the learner corpora, then analyzes why there exists a gap and a lag between corpus-linguistic research and classroom teaching. Based on the studies of the previous corpus work and the presentation of a small corpus built by the author himself and the pedagogical theories of EFL teaching, the paper points it out that compared with those established corpora, a self-built mini-corpus has its unique advantages, which can contribute to eliminate the drawbacks of the established corpora to a large extent. Finally, the paper concludes that it should be necessary and feasible for an EFL teacher, abiding by some basic principles, to build an EFL learner-oriented mini-corpus for practical classroom uses when she or he intends to apply corpora to EFL teaching.
Evolution of EFL Teaching Methods
In the history of English education, where there is EFL teaching and learning, there is a successive pursuit of ideal teaching methods. Different methods have been introduced, tried out and found unsatisfactory, among them, the ‘Direct Method’ in the early decades of last century, the ‘Situational Method’ in the 1960s, the ‘Audiolingual Method’ in the 1970s, and the ‘Communicative Approach’ in the 1980s (Yan, Zhou, & Dai, 2007). Through trial and error, people have realized no single method seems good enough to be universally accepted as best (Yan, Zhou, & Dai, 2007). Thus, the best method is most likely to be the collaboration of the positive parts of different methods. Only when an EFL teacher can familiarize him/herself with the essences of those methods and flexibly put them into teaching in accordance with the particular classroom situations can he/she reach the summit of successful EFL teaching. To briefly look back at the evolution of EFL teaching methods, the author aims to indicate that though this paper is in favor of the use of corpora, it does not claim that this is the only method applicable to classroom teaching. The paper also intends to indicate that although this paper advocates the use of a mini-corpus built by EFL teachers themselves, it does not object to the use of established corpora, on the contrary, does suggest the combination of self-built corpora with the established ones.
Modern General Corpora and Learner Corpora
According to McEnery & Wilson (2001), “a corpus in modern linguistics might be described as a finite-sized body of machine-readable text, sampled in order to be maximally representative of the language variety under consideration” (p. 32). The history of the development of modern machine-readable corpora began from the Brown corpus of American English, then, LOB (the Lancaster-Oslo/Bergen) corpus of British texts. In these representative corpora, the criteria used for text selection were set, so as to ensure how the language variety is to be sampled, and how many samples of how many words are to be collected so that a pre-defined grand total is arrived at (McEnery & Wilson, 2001, p. 31). So, when the pre-defined number of words arrives, these corpora will not increase in word collection. Some major corpus projects such as the BNC (British National Corpus) -- a 100,000,000 word representative corpus of contemporary British written and spoken texts stand in direct line of succession to Brown and LOB. Quite differently, monitor corpora, such as the Bank of English at Birmingham University, represent a different approach. These corpora often have no final extent because, “like the language itself, it keeps on developing.” (Sinclair, 1991, p. 25)
General corpora, which collect authentic (or standard, or native) language, “are important in language learning as they expose students at an early stage in the learning process to the kinds of sentences and vocabulary which they will encounter in reading genuine texts in the language or using the language in real communicative situations” (McEnery & Wilson, 2001, p.120). However, “one should not ‘exaggerate’ the impact of native corpora on foreign language teaching and, while having access to comprehensive frequency lists may well help course designers compile better lexical syllabi, it will not give them access to learners’ actual lexical problems.” (Granger, 1994) What Granger remarked may well explain why learner corpora are compiled. Being different from a general corpus, a learner corpus is “a computerized textual database of the language produced by foreign language learners” (Leech, 1998). Generally, learner corpora are important because they provide a deviation from the standard, that is, the language of the native speakers of a particular language.
The first learner corpus created in an academic setting is ICLE, launched by Sylviane Granger in 1990 and currently being coordinated by her at the University of Louvain-la-Neuve in Belgium. The corpus, at first, aims to collect dependable evidence on learners’ errors and to compare them cross-linguistically in order to determine whether they are universal or language specific. In addition, the comparison is carried out to determine to what extent they are affected by factors in the learner’s cultural or educational background. The second objective of ICLE is to investigate aspects of foreign surroundings in non-native essays, which are usually revealed by the overuse or underuse of words or structures with respect to the target language norm. This investigation is done by means of a comparison between individual L2 sub-corpora and native English corpora, such as the International Corpus of English, the LOB, and the Louvain Corpus of Native English Essays.
Centered on the study of learners’ own EFL learning processes and outputs, learner corpora are psychologically much nearer to EFL teachers and learners in comparison with those general corpora. Recently, the learner corpora of different types and language backgrounds have expanded enormously and developed quickly, especially in Europe and Asia, such as LLC, CLC (Cambridge Learner’s Corpus), PELCRA (Polish-English Language Corpus Research and Applications), HELC (Hungarian EFL Learner Corpus), JEFLL (Japanese EFL Learner’s Corpus), CCLE (Project of 1 million word Corpus of Chinese Learner of English), HKUST (Hong Kong University of Science and Technology) Corpus of Learner English, and etc.
Established Corpora and EFL Teaching and Learning
Since its establishment, each of the corpora has contributed more or less to language, lexically, structurally, lexico-grammatically, morphologically, phonologically, and of course, pedagogically. Especially the learner corpora such as ICLE provide excellent materials for EFL research in many different areas. They have brought new insights into learner language, which can be applied to EFL teaching material design and classroom methodology. Theoretically, corpora of different categories, like general corpora, learner corpora and multilingual corpora can all benefit EFL teachers and learners to a high degree, with the result that some optimistic linguists, like Sinclair (1996) announced that “the deployment of corpora would improve the teaching and learning of English worldwide.” Technically, Tribble (1997) predicted that “as the rapid development of telecommunications for computing meant that now (or very soon) a large number of teachers and students would be able to access the BNC or the Bank of English on-line and use the same search engines as their university or commercial counterparts.” He then pointed it out that the corpus was “no longer the sole preserve of the university or commercial research team” (Tribble, 1997).
However, it is very disappointing that at present, a decade after their announcement, corpora haven’t been embraced by most of the EFL teachers and learners in nearly every nation. Reasons for this poor reception are manifold, and some major ones are summarized as follows:
1. The terms employed by a corpus may ‘frighten’ common EFL teachers and learners who know little about computer science. With a brief survey of the most influential established corpora, we can find participants building a corpus can be divided in to 2 main groups: one is only composed of computer professionals, and the other, both linguists and computer professionals. This implies that staff with an educational background in computer science exerts much influence on the process of corpus compilation. As a result, terms of computer sciences, such as parsing, tagging, token, node and etc. frequently appear in books or papers relevant to corpus linguistics. Examples of these terms may be fairly understandable to computer professionals, but they are rather difficult for EFL teachers and learners to know what they exactly mean, even if they are translated into their native language. To EFL teachers and learners, this is undoubtedly frustrating because they are usually required to be familiar with those technical terms and sometimes quite complex search procedures for carrying out corpus investigations.
2. Though there are free online sample corpora available, it must be expensive for EFL teachers and learners to deeply investigate the corpora such as LOB, BNC, ICLE, and so on. The online availability of corpora could undisputedly benefit EFL teachers and learners and promote their knowledge of corpora to some certain extent, but if they want to further use the corpora, their attempts are frequently hindered because “large general corpora are only available to researchers who have access to powerful workstation computers.” (Landry, 2003)
3. Authenticity is a word often associated with the value of corpora. It seems that established corpora could expose students to genuine texts in the language and help to expand their linguistic awareness. But, what is so-called authenticity? Taylor (1994) says, “The concept of authenticity is an abstract quality that depends on too many variables to be defined.” He believes that the classroom itself creates its own ‘authenticity’ (Taylor, 1994). In language teaching, the individual learner’s language level and his/her progression are more important than many of the other things. For example, a student, whose English language is not good enough, is most likely to be impatient with the huge size of the instances of word concordances retrieved from the established corpora. Moreover, it is also tough for him/her to understand the enormous range of background knowledge related to those instances, cultural as well as linguistic.
4. Judging from the brief introduction to modern general and learner corpora, it goes without saying that most of the texts collected in an established corpus (with the exception of a monitor corpus) are relatively not updated because of the characteristics of their finite size. Psychologically, what and how much is learned is much influenced by a learner’s motivation. A person is most interested in whatever is around or closely linked with him/her. Consequently, it is nearly unbelievable that texts in such corpora as Brown and LOB are still attractive to today’s EFL teachers and learners. The students are “unlikely to be motivated by a language learning activity if the instances of language use that they are studying are taken from contexts, which make no connection with their interests and concerns” (Tribble, 1997). In general, then, it is quite obvious that from the perspective of a specific classroom context – or, for that matter, from a specific teacher’s perspective –corpora are required to include the language of the learners that are present in the very classroom.
Self-built Mini-corpora and EFL Teaching and Learning
Just as only one of the teaching methods fails to provide versatility in EFL teaching and learning, only making use of established general and learner corpora is not sufficient in classrooms, either. A competent EFL teacher can never explore too many effective measures to improve his/her teaching capacity. With the prevailing of computer-assisted tools used in EFL classrooms and the development of corpora linguistics, it is a meaningful attempt for EFL teachers to try out the uses of corpora in teaching activities. However, it is necessary for the EFL teachers to find an ideal method to eliminate the drawbacks of those established corpora in order to take full advantage of corpus research findings. Based on the investigation into the established corpora and the aims to inspire the learner’s study motivation, the application of self-built learner-oriented mini-corpora is then recommended in EFL teaching and learning in this paper.
With recent computer technology and online information available, a common EFL teacher, can easily and undoubtedly build a learner-oriented corpus by him/herself in an economical way, even if he/she knows little about computer:
1. Judging from the perspective of hardware support, a PC plus mobile mass storage devices can store as many linguistic materials as possible.
2. The software, needed to build a corpus, can be downloaded from websites, even sometimes free of charge, such as ConcApp (see http://www.sussex.ac.uk/languages/1-6-6.html) and Wordsmith Tools (see http://www.oup.com/elt/catalogue/guidance_articles/ws_form?cc=global).
3. Websites like http://bowland-files.lancs.ac.uk/courses/ahaw-nscl/l04_top.htm voluntarily provide very understandable instructions on how to build a corpus for personal uses in simple words.
So, if an EFL teacher wants to self-build a corpus, he/she can absolutely attain this goal without any difficulty. In comparison with the established corpora, a self-built corpus has its distinctively pragmatic effects:
1. The teacher can present his/her learners with the most recent texts or those texts most related to the learners’ interests or concerns. For example, the Olympic games will be held in Beijing, China in 2008, and the topic of Olympics is heated among Chinese students. The teacher can then collect some recent Olympic news from the Internet as the raw material of his/her corpus, and compared with those established corpora, such as LOB, the self-built corpus can of course expose students to the newest instances (See Figure 1 and Figure 2.). This may stimulate the learners to get further knowledge of the Olympics by themselves, which spontaneously promote their motivation to study.
(Figure1: Word Olympic displayed on a self-built mini-corpus. Text collected from the piece of news about Beijing 2008 Torch Relay, posted by the official website of the Beijing 2008 Olympic games on Apr. 26, 2007. See http://torchrelay.beijing2008.cn/en/news/headlines/n214042288.shtml)
(Figure 2: Word Olympic displayed on online LOB. See http://www.edict.com.hk/concordance/WWWConcappE.htm)
2. The teacher can organize learner-centered activities in an EFL classroom. For example, by handing out to learners the error data collected in a written examination, the teacher can ask the learners to help each other correct the mistakes. Then, the learners will not only profit from the correction of their own mistakes but also from the analysis of their peers’ errors and corrections. In addition, since corpus-linguistic software allows learners not only to look for particular words and patterns but also for particular categories of errors, they may also find it useful to review their errors in terms of error categories.
3. The teacher can observe the learning process of his/her own learners’ language both quantitatively and qualitatively. For example, by collecting the assignments submitted by the learners in his/her own corpus and using the corpus built by him/herself, the teacher could generate wordlists to check the range of the vocabulary that learners of the whole class or individual learners have used.
4. The teacher can evaluate the progression in learners’ language with a longitudinal perspective and then focus either on the class as a whole or on specific learners in particular. For example, by comparing the learners’ data collected in his/her own corpus of the first semester with those of the second semester, the teacher can find out whether specific kinds of errors occur more frequently or less frequently after one semester.
5. The teacher can not only analyze the corpus in its entirety, but also focus on individual learners. For example, the teacher can provide specific feedback to an individual learner by providing him/her with concordance lines that highlight frequently occurring kinds of mistakes in that particular learner’s language.
6. The teacher can decide the size of and the degree of difficulty of the texts collected in his/her own corpus in accordance with the learners’ language level. By doing so, the learners will be less likely to be deluged in hundreds or thousands of examples and be confused by the lack of cultural and linguistic knowledge, which they may often encounter in the established corpora.
Principles of self-building a mini-corpus
Before an EFL teacher begins his/her work to build a mini-corpus, he/she should take the following into consideration:
1. A mini-corpus must be learner-oriented: This means that the texts collected must be of learners’ interest or concerns, that learners’ own work such as their written assignments or test papers must be included.
2. A mini-corpus must be understandable: In a mini-corpus, the EFL teacher should try to interpret things in common words instead of the specialized terms, which are intricate to the learners, such as those in computer sciences.
3. A mini-corpus must be difficulty-suitable: When building a mini-corpus, the EFL teacher must decide whether the content of text collection will be difficult or not for the learners by evaluating the learners’ language level. Otherwise, the learners may be frustrated with the instances displayed in the corpus they don’t understand.
In addition, what Aston (1997) lists in his arguments for the use of smaller corpora in data-driven learning is also applicable to a self-built mini-corpus:
4. A mini-corpus must be fully analyzable: It must be possible for an individual learner or for a group to collectively investigate all of the lexical types, which occur with any frequency in a mini-corpus.
5. A mini-corpus must be easy to become familiar with: The learners, either individually or in groups using jig-saw techniques, can read through an entire mini-corpus. Then, they can draw on familiarity to help them interrogate the corpus.
6. A mini-corpus must be more clearly patterned: Collocations and other word associations must be self-evident to identify in a mini-corpus.
Conclusion
The diversity of needs of English language learners has long been acknowledged (Tarone & Yule, 1989, p.10). It is necessary for an EFL teacher to make endless efforts to pursue the most efficient and effective teaching methods to meet his/her learners’ various needs. Corpus use contributes to language teaching in a number of ways (Aston, 2000; Leech, 1997; Nesselhauf, 2004). Research on learner corpora also contributes to “our understanding of language learning processes” (Granger et al., 2002). However, it has taken many years for now established corpora such as the Bank of English to “produce tangible pedagogical results” (Nunn, 2005). At present, there are still very few studies, which “relate the findings from learner corpora to actual classroom practice” (Tono, 2003). Consequently, it is both the corpus linguists’ and the EFL learners’ responsibility to narrow the gaps between corpus linguistic research and EFL teaching and learning. The development of computer technology and the Internet has made it completely possible and feasible for EFL teachers to make good use of corpus linguistic research findings and also, to compile a learner-oriented mini-corpus to complement the existing drawbacks of the established general and learner corpora. The use of established and self-built corpora is compatible with all other teaching methodologies and deserves to be tried out in an EFL classroom context so as to benefit EFL learners in the long run.
References
Aston, G (1997). Small and large corpora in language learning. In J. Melia & B. Lewandowska-Tomaszczyk (Eds.), PALC 97: Practical applications in language corpora (pp.51-62). Lodz: Lodz University Press.
Aston, G. (2000). Corpora and language teaching. In L. Burnard & T. McEnery (Eds.), Rethinking Language Pedagogy from A Corpus Perspective: Papers from The Third International Conference on Teaching and Language Corpora (pp. 7-17). Hamburg: Peter Lang.
Gao, Y. (2005). CAI and CMI and Internet in digital era. CELEA Journal. 28(1), 95-98.
Granger, S. (1994). The learner corpus: A revolution in applied linguistics. English Today. 10(3), 25-29.
Granger, S., Hung, J., & Petch-Tyson, S. (Eds.) (2002). Computer learner corpora, second language acquisition and foreign language teaching. Amsterdam: John Benjamins.
Granger, S. (2004). Computer learner corpus research: current state and future prospects. In U. Connor & T. Upton (Eds.). Applied corpus: A multidimensional perspective (pp.123-145). Amsterdam: Rodopi.
Landry, K. (2003). Dictionaries usage in EFL and learner development. Asian EFL Journal. 5(1). Retrieved Mar. 5, 2007 from
http://www.asian-efl-journal.com/march03.sub6.php
Leech, G. (1997). Teaching and language corpora: A convergence. In A. Wichmann, S. Fligelstone, T. McEnery & G. Knowles (Eds.), Teaching and language corpora (pp. 1-23). New York: Addison Wesley Longman.
Leech, G. (1998). Preface. In S. Granger (Eds.), Learner English on computer (pp.14-20). London and New York: Addison Wesley Longman
McEnery, T. & Wilson, A. (2001). Corpus linguistics (2nd edition). Edinburgh: Edinburgh University Press.
Nesselhauf, N. (2004). Learner corpora and their potential for language teaching. In J. Sinclair (Eds.), How to use corpora in language teaching (pp.125-152). Amsterdam: Benjamins.
Nunn, R. (2005). Competence and teaching English as an international language. Asian EFL Journal. 7(3). Retrieved Mar. 5, 2007 from
http://www.asian-efl-journal.com/September_05_rn.php
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Sinclair, J. (1996). How to use corpora in language teaching. Retrieved Apr. 27, 2007 from http://www.beaugrande.com/SinclairReview.htm
Smith, J. (2007). The contribution of EFL programs to community development in China. Asian EFL Journal. 9(1). Retrieved Apr. 10, 2007 from
http://www.asian-efl-journal.com/March_07_js.php
Tarone, E. & Yule, G. (1989). Focus on the language learner. Oxford: Oxford University Press.
Taylor, D. (1994). Inauthentic authenticity or authentic inauthenticity? TESL-EJ, 1(2). Retrieved April 25, 2007 from http://www-writing.berkeley.edu/tesl-ej/ej02/a.1.html.
Tian, S. (2004). Using corpora concordancing to assist low-achievement EFL students. A Special Edition for The Fourth International Conference on ELT in China. China: Beijing. Retrieved Apr. 25,2007 from http://www.celea.org.cn/lw.htm#
Tono, Y. (2003). Learner corpora: design, development and applications. In D. Archer, P. Rayson, A. Wilson, & T McEnery (Eds.), Proceedings of The Corpus Linguistics 2003 Conference (pp. 800-810). Lancaster: Lancaster University Press.
Tribble, C. (1997). Improvising corpora for ELT: quick-and-dirty ways of developing corpora for language teaching. In J. Melia & B. Lewandowska-Tomaszczyk (Eds.), PALC 97: Practical applications in language corpora (pp.106-118). Lodz: Lodz University Press.
Yang, J. & Liao, J. (2004). Introducing online corpora into ELT classroom. A Special Edition for The Fourth International Conference on ELT in China. China: Beijing. Retrieved Apr. 25, 2007 from http://www.celea.org.cn/lw.htm#
Yan, X., Zhou, Z., & Dai, P. (2007). Principled eclecticism in college English teaching in China. Professional Teaching Articles of Asian EFL Journal. 17(Article 1). Retrieved Apr. 15, 2007 from http://www.asian-efl-journal.com/pta_Jan_07_yxy.php