Introduction Overview The Users Reference Guide for the British National Corpus contains a description of the design principles underlying the British National Corpus (BNC), and detailed information about the way in which it is encoded, covering both the markup conventions applied and the linguistic annotation with which the corpus was enriched. This revised edition has been slightly reorganized and considerably expanded to provide a complete reference work for users of the corpus in its new XML form. The text of the manual is available in TEI-XML and in HTML format, and also from the BNC website at , from which updated versions may be obtained. The material presented in this manual derives originally from a number of BNC Project internal documents, combining contributions from all the participants in the project (see further ); any errors introduced are the responsibility of the editor. Please send any comments or corrections to natcorp@oucs.ox.ac.uk. Section describes the basic structure of the BNC encoding scheme, in terms of the XML elements and attributes distinguished and the tags used to mark them. Section describes features which are peculiar to written texts, and section those peculiar to spoken texts. In each case, a distinction is made between those elements which are marked up in all texts and those which (for technical or financial reasons) are not always so distinguished, and hence appear in some texts only. It should be noted that by no means all of the features described here will be present in every text of the corpus, nor, if present, will they necessarily be tagged. Section describes the structure of the detailed metadata associated with each text, in the form of the teiHeader element attached to each component of the corpus, and also to the whole corpus itself. This is complemented in section by a detailed presentation of the linguistic annotation or wordclass tagging applied throughout the corpus. (This chapter is derived from the Tagging Guide (Smith et al) originally distributed separately with BNC World) Section discusses briefly some ways of exploiting the the BNC computationally. Section complements the metadata supplied in the header by listing and documenting several of the coded values used in the markup. A brief bibliography combining significant background readings about the BNC with works cited elsewhere in the manual is provided in section and a complete list of all the original sources from which the corpus was compiled is given in section . Section documents suggested settings for those wishing to use the XAIRA system to index and query the BNC. The pre-built XAIRA index delivered as part of the BNC XML package was made using the XAIRA specification described in this section. This section is provided for the convenience of XAIRA users; it may be ignored if you are using some other software to search or manage the corpus. Finally, a reference section () provides an alphabetical list of all XML elements and attributes used in the markup of the corpus, together with the model and attribute classes to which they belong, and macros used to simplify references to them. This specification conforms to the 2007 (P5) edition of the TEI Guidelines (), with which it should be read in conjunction. The BNC was originally created by an academic-industrial consortium whose original members were:

Oxford University Press

Longman Group Ltd

Chambers Harrap

Oxford University Computing Services

Unit for Computer Research on the English Language (Lancaster University)

British Library Research and Development Department

Creation of the corpus was funded by the UK Department of Trade and Industry and the Science and Engineering Research Council under grant number IED4/1/2184 (1991-1994), within the DTI/SERC Joint Framework for Information Technology. Additional funding was provided by the British Library and the British Academy. Maintenance, distribution, and development of the corpus has been carried out at Oxford University Computing Services. There have been three major revisions of the corpus:

BNC 1.0 (1995)

BNC World Edition (2000)

BNC XML Edition (2007)

For a brief historical overview of the project see Burnard 2002. Acknowledgments BNC 1.0 Management of the original BNC project was co-ordinated by an executive committee whose members were as follows: OUP

Tim Benbow; Simon Murison-Bowie

Longman

Della Summers; Rob Francis

Chambers Harrap

John Clement

OUCS

Lou Burnard

UCREL

Geoffrey Leech

British Library

Terry Cannon

DTI observers

Gerry Gavigan; Donald Bell

An Advisory Council supervised the running of the project 1991-1994. Members of this Council were:

Dr Michael Brady

Christopher Butler

Professor David Crystal

Sir Antony Kenny (chair)

Dr Nicholas Ostler

Professor Sir Randolph Quirk

Tim Rix

Dr Henry Thompson

Many people within each member organization made major contributions to the success of the project. It is a pleasure to acknowledge their hard work and dedication here. OUP

Lyndsay Brown; Jeremy Clear (project manager 1991-2); Caroline Davis; Ginny Frewer; Frank Keenan; Tom McLean; Anita Sabin; Ray Woodall (project manager 1992-4)

Longman

Steve Crowdy (project manager); Denise Denney; Duncan Pettigrew

Chambers Harrap

Robert Allen; Ilona Morison

OUCS

Glynis Baguley; Gavin Burnage; Tony Dodd; Dominic Dunlop (project manager 1992-4)

UCREL

Tom Barney; Michael Bryant (project manager 1991-3); Elizabeth Eyes; Jean Forrest; Roger Garside; Mary Hodges; Mary Kinane; Nicholas Smith; Xungfeng Xu.

The project also benefited greatly from the advice and support of many external consultants. Listing all those who have influenced our thinking and to whom we are indebted would be very difficult, but chief amongst them we would like to thank:

Sue Atkins

Clive Bradley

Ann Brumfitt

Charles Clark

James Clark

Bruce Heywood

Mark Lefanu

Michael Rundle

Richard Sharman

Michael Sperberg-McQueen

Anna-Brita Stenström

Russell Sweeney

BNC World After the completion of the first edition of the BNC, a phase of tagging improvement was undertaken at Lancaster University with funding from the Engineering and Physical Sciences Research Council (Research Grant No. GR/F 99847). This tagging enhancement project was led by Geoffrey Leech, Roger Garside and Tony McEnery. The main objective was to correct as many tagging errors as possible, using an enhanced version of Claws4. In addition, a new tool was developed (the Template Tagger) for patching the corpus in such a way as to eliminate further sets of errors by rule. This tool was developed by Michael Pacey, building on a prototype written by Steven Fligelstone. The research team working on tagging improvement was Nicholas Smith (lead researcher), Martin Wynne and Paul Baker. Correction and validation of the bibliographic and contextual information in all the BNC Headers was carried out at OUCS by Lou Burnard, with assistance at various stages from Andrew Hardie and Paul Groves, who helped check demographic details for all spoken texts, and in particular from David Lee, who checked bibliographic and classification information for the bulk of the written texts. Thanks are also due to the many users of the original version of the BNC who took the time to notify us of errors they found. BNC XML Thanks are due to Martin Wynne and Ylva Berglund who first suggested the idea of an XML version of a subset of the BNC. Production of that edition (BNC Baby) provided valuable experience in automatic conversion of the World edition. The bulk of the technical work involved in producing the XML edition was carried out by Tony Dodd and Lou Burnard, with assistance and advice from many BNC users and beta-testers worldwide, in particular Guy Aston, Andrew Hardie, Paul Rayson, and Sebastian Rahtz. Without their input the present revision would have been impossible.