[bnc] Workshop: Using large-scale XML corpora 2007

Using large-scale XML corpora in Language and Literature

This workshop was held on November 26, 2007, sponsored by the AHRC Methods Network. Material used at the workshop can be accessed via the links below. A repeat event is scheduled for January 15 2008.

Programme

8.45 -9.30 Registration and coffee
9.30 - 10.00 Introductions and welcome
10.00 - 10.30 BNC Design (presentation)
10.30 - 11.00 Visualising a BNC XML text (exercise)
11.00 - 11.30 coffee break
11.30 - 12.15 BNC use in teaching (invited lecture)
12.15 - 13.00 Using XAIRA with BNC (exercise)
13.00 - 14.00 lunch
14.00 - 14.30 Corpus resources (presentation and discussion of participants' materials)
14.30 - 15.15 Indexing your own corpus (exercise)
15.15 - 15.45 tea break
15.45 - 16.30 Other kinds of corpus: A specialised corpus of transcribed speech
16.30 - 17.00 Specialising (exercise)
17.00 - 17.30 Evaluation and discussion

Background reading and other resources

About the workshop

This one day workshop introduced the technologies needed to unlock the potential uses of large scale XML-encoded language corpora, with a particular focus on the most recent version of the British National Corpus (BNC XML Edition). Participants learned how to explore this particular corpus using a variety of generic XML tools, focusing on (but not limited to) XAIRA, a general purpose software architecture for the linguistic analysis of large XML corpora. They explored the kinds of language learning activities and linguistic analyses best supported by such tools, and discussed the usability of such tools for fundamental linguistic and literary research in large text bases. The course had a strong practical component, and participants were encouraged to provide samples of their own textual materials to experiment with corpus construction and analysis.

The workshop was aimed at two distinct groups of researcher. The first group contains language or literature specialists who are aware of the potential for corpus-based methods in language pedagogy or literary research and want to apply them either with their own corpus material or with the BNC in its new format. The second group contains technical specialists who are aware of the demand for corpus resources and want to gain practical experience of using XML for corpus creation, development, and usage. The workshop aimed to stimulate dialogue between the two groups, and promote a shared understanding of common goals.

The Workshop was taught by Lou Burnard and Ylva Berglund Prytz (Oxford), together with Guy Aston (Forli).

Up: Contents

Using large-scale XML corpora in Language and Literature