add this bookmarking tool

Using large-scale XML corpora in Language and Literature

This workshop was held on January 15, 2008 as part of the OUCS IT Learning Programme. Material used at the workshop can be accessed via the links below.

Programme

  • 8.45 -9.30 Registration and coffee
  • 9.30 - 10.00 Introductions and welcome
  • 10.00 - 10.30 BNC Design (presentation)
  • 10.30 - 11.00 Getting to know BNC-XML (exercise)
  • 11.00 - 11.30 coffee break
  • 11.30 - 12.15 BNC use in teaching (invited lecture)
  • 12.15 - 13.00 Exploring the BNC with Xaira (demo exercise)
  • 13.00 - 14.00 lunch
  • 14.00 - 14.30 Corpus resources (presentation and discussion of participants' materials)
  • 14.30 - 15.15 Indexing a corpus with Xaira (exercise)
  • 15.15 - 15.45 tea break
  • 15.45 - 16.30 Other kinds of corpora (invited presentations)
    • Corpus approaches to the language of literature
    • Any Questions? A corpus of transcribed speech (presentation)
  • 16.30 - 17.00 Specialising (exercise)
  • 17.00 - 17.30 Evaluation and discussion

Background reading and other resources

About the workshop

This one day workshop introduced the technologies needed to unlock the potential uses of large scale XML-encoded language corpora, with a particular focus on the most recent version of the British National Corpus (BNC XML Edition). Participants learned how to explore this particular corpus using a variety of generic XML tools, focusing on (but not limited to) XAIRA, a general purpose software architecture for the linguistic analysis of large XML corpora. They explored the kinds of language learning activities and linguistic analyses best supported by such tools, and discussed the usability of such tools for fundamental linguistic and literary research in large text bases. The course had a strong practical component, and participants were encouraged to provide samples of their own textual materials to experiment with corpus construction and analysis.

The workshop was aimed at two distinct groups of researcher. The first group contains language or literature specialists who are aware of the potential for corpus-based methods in language pedagogy or literary research and want to apply them either with their own corpus material or with the BNC in its new format. The second group contains technical specialists who are aware of the demand for corpus resources and want to gain practical experience of using XML for corpus creation, development, and usage. The workshop aimed to stimulate dialogue between the two groups, and promote a shared understanding of common goals.

The Workshop was taught by Lou Burnard and Ylva Berglund Prytz (Oxford), together with Guy Aston (Forli) and with a presentation by Martin Wynne (Oxford).

Up: Contents