Educational Technology for Language Learning 7

CTI Centre for Modern Languages and Aston University

February 5-6 1997

Claire Warwick, OUCS


Introduction

The aim of this conference was to introduce the use of Corpora in language teaching, whether as a resource to teach linguistics to native speakers or to teach second languages to non-native speakers. The conference was intended to be a small forum for an invited audience of about 25 people. Those who attended were mixture of language teachers, humanities computing professionals and programmers with an interest in CALL and its use and development.


Papers Presented

Tony McEnerey (Lancaster) gave an overview of the use of Corpora and Language Teaching-from Creation to Exploitation, much of which was based on what he had learned from papers given at the two TALC conferences. He noted that teachers of language were increasingly realising that corpora can be used directly by learners, and that they enable the student to learn without the intervention of a teacher. This allows them to make their own discoveries about the nature of language in the ways which most interest them.

This experience, he suggested, must lead to a reassessment of the design of corpora, so that their function as learning resource can be recognised. As a result teachers will need to consider how they would like corpora to develop, which may lead to a redefinition of the relationship between teacher, learner, and corpus. For example the POS tag set which is produced by automatic taggers such as CLAWS is biased towards the need of NLP researchers and not to those of language teachers. It cannot, for instance, distinguish an imperative form of a verb. It is important, however that that learners should realise that the corpus should not the the sole explicandum of language use, and teachers will need to explore the way in which intuition can interact with corpus data.

Peter Roe (Aston) gave a paper titled Hunt the Fossilduring which he demonstrated the use of ATA, a corpus analysis tool which he had developed for use by students at Aston. These students are usually non-native speakers of English whose field is science or engineering. They are encouraged to build a corpus of their own, and then ATA is used to analyse it. The methods and assumptions employed in doing this are rather different from those used by the BNC. Firstly the text is not tagged in any way, and is simply inputted as a long ASCII file. This means that the functionality of the search software is somewhat limited as compared to SARA. However, it was stressed that this was only meant to do simple analyses such as producing concordances and word frequency lists.

The other major assumption concerns the type of data used in the analysis. The students only collect examples of language with which they are familiarand are interested in interrogating, for example, if the student is a civil engineer, the articles will only represent his/her branch of civil engineering. The assumption being made is that students can only learn by using a discourse that is meaningful to them in L1 to learn about L2. They are not exposed to a general corpus even as a comparison or control corpus against which to test the insights they have made.

This obviously contradicts the basic assumption that we have made in the design of the BNC, that is that our corpus needs to represent as wide a variety of modern British English as possible. This means that if language learners use the BNC they must be introduced to a very wide variety of discourses. As Guy Aston has demonstrated, learners do made meaningful discoveries about language by using the BNC, and being exposed to a very wide variety of discourses, not only those with which they are familiar.

I then gave a live demo of SARA. It worked very well, and, amazingly, did not crash, even though I was connected to the server back in Oxford. This seemed to be very well received. In particular I was asked whether SARA would be available independently of the BNC as a search tool for SGML documents.

Tim Johns and David Wools(Birmingham) then gave a demonstration of Monolingual and Multilingual Concordancing for the classroom at Birmingham Universityin for which they used Multiconcord, a software package for the analysis of parallel corpora. Tim also talked about his experience of data-driven learning.

Patrick Hanks (OUP) gave a paper about Norms and Exploitations in Linguistic Behaviourin which he discussed the ways in which corpora can aid lexicographers. He investigated the problem that dictionaries cannot distinguish between norms and variations of language use. In other words a dictionary tells you what a word means but cannot tell you in what context its use would be appropriate. He used the verb 'to bake' as his example. Use of a corpora can help the learner decide the context in which a word is usually used. A corpus can tell us that we would often say ' I was baking in the kitchen' and that while it is syntactically correct to say ' I was baking in the bathroom' the use would be unusual, and would require more explanation on the part of the speaker.

I presented a paper about the BNC and the use of SARA, which was intended to fill in the background to the demonstration I had given on the previous evening.

Kurshid Ahmad (Surrey) gave a paper about Corpora-assisted Language Learning and Language Teaching-A Computational Perspective. He presented us with a short history of the progress of what is now known as CALL and the computational and theoretical assumptions underlying it. He stressed that texts in a corpus all belong to a family and that they have a common 'genetic' pool. They must have some kind of linkage, and the composition of the corpus is not random. Even if the corpus is opportunistic there must be some underlying pattern to be perceived within it.

He posed the question of what effects the initial grouping has on the final outcome of the text analysis, and what this grouping must tell us about the way we use language. (Although he did not offer a definite answer to either). He insisted on the importance of making clear the assumptions that designers use in putting together a corpus, as this will affect its use.The knowledge of these assumptions will help future corpus builders and end-users to incorporate as much as is known about the contents and description of texts for future storage and retrieval.

In the case of the BNC these assumptions are explained in theUser Reference Manual, and in Burnage and Dunlop (1993), Burnard (1996), Crowdy (1994) and (1995), Garside (1995) and (1996), Leech (1992), (1993) and (1994) and Leech, Garside and Bryant 1994.


Conclusion

The conference provided a valuable opportunity for me to introduce the BNC to a wider and largely new audience. It also enabled me to learn more about the way in which others are using corpora in language teaching, and the ideas and assumptions which underlie this.