add this bookmarking tool


This manual contains a description of the design principles underlying the British National Corpus (BNC), and detailed information about the way in which it is encoded, in particular, a definition of the SGML document type declaration (DTD) used. A list giving brief bibliographic details for each text making up the corpus is also included.

This edition of the manual is a revised version of the document released with version 1.0 of the corpus, as distributed in May 1995. It describes the BNC World Edition, released in October 2000. Further information about the BNC is also available from its World Wide Web server at

The material presented in this manual derives from a number of BNC Project internal documents, with original contributions from all the participants in the project. Factual errors, chiefly relating to the composition of the corpus, have been corrected and the description of the encoding scheme has been modified in line with the changes introduced in this version. In other respects, this version of the documentation has been unchanged from the first release of the corpus. A brief list of the revisions made to the corpus encoding is given in section ??.


The BNC was created by an academic-industrial consortium whose original members were:
  • Oxford University Press
  • Longman Group Ltd
  • Chambers Harrap
  • Oxford University Computing Services
  • Unit for Computer Research on the English Language (Lancaster University)
  • British Library Research and Development Department

Creation of the corpus was funded by the UK Department of Trade and Industry and the Science and Engineering Research Council under grant number IED4/1/2184 (1991-1994), within the DTI/SERC Joint Framework for Information Technology. Additional funding was provided by the British Library and the British Academy.

Management of the project was co-ordinated by an executive committee whose members were as follows:
Tim Benbow; Simon Murison-Bowie
Della Summers; Rob Francis
Chambers Harrap
John Clement
Lou Burnard
Geoffrey Leech
British Library
Terry Cannon
DTI observers
Gerry Gavigan; Donald Bell
An Advisory Council supervised the running of the project 1991-1994. Members of this Council were:
  • Dr Michael Brady
  • Christopher Butler
  • Professor David Crystal
  • Sir Antony Kenny (chair)
  • Dr Nicholas Ostler
  • Professor Sir Randolph Quirk
  • Tim Rix
  • Dr Henry Thompson
Many people within each member organization made major contributions to the success of the project. It is a pleasure to acknowledge their hard work and dedication here.
Lyndsay Brown; Jeremy Clear (project manager 1991-2); Caroline Davis; Ginny Frewer; Frank Keenan; Tom McLean; Anita Sabin; Ray Woodall (project manager 1992-4)
Steve Crowdy (project manager); Denise Denney; Duncan Pettigrew
Chambers Harrap
Robert Allen; Ilona Morison
Glynis Baguley; Gavin Burnage; Tony Dodd; Dominic Dunlop (project manager 1992-4)
Tom Barney; Michael Bryant (project manager 1991-3); Elizabeth Eyes; Jean Forrest; Roger Garside; Mary Hodges; Mary Kinane; Nicholas Smith; Xungfeng Xu.
The project also benefited greatly from the advice and support of many external consultants. Listing all those who have influenced our thinking and to whom we are indebted would be very difficult, but chief amongst them we would like to thank:
  • Sue Atkins
  • Clive Bradley
  • Ann Brumfitt
  • Charles Clark
  • James Clark
  • Bruce Heywood
  • Mark Lefanu
  • Michael Rundle
  • Richard Sharman
  • Michael Sperberg-McQueen
  • Anna-Brita Stenstrom
  • Russell Sweeney

After the completion of the first edition of the BNC, a phase of tagging improvement was undertaken at Lancaster University with funding from the Engineering and Physical Sciences Research Council (Research Grant No. GR/F 99847). This tagging enhancement project was led by Geoffrey Leech, Roger Garside and Tony McEnery. The main objective was to correct as many tagging errors as possible, using an enhanced version of Claws4. In addition, a new tool was developed (the Template Tagger) for ‘patching’ the corpus in such a way as to eliminate further sets of errors by rule. This tool was developed by Michael Pacey, building on a prototype written by Steven Fligelstone. The research team working on tagging improvement was Nicholas Smith (lead researcher), Martin Wynne and Paul Baker.

Correction and validation of the bibliographic and contextual information in all the BNC Headers was carried out at OUCS by Lou Burnard, with assistance at various stages from Andrew Hardie and Paul Groves, who helped check demographic details for all spoken texts, and in particular from David Lee, who checked bibliographic and classification information for the bulk of the written texts. Thanks are also due to the many users of the original version of the BNC who took the time to notify us of errors they found.

Thanks are also due to Sebastian Rahtz for his help in the production of this manual.

Up: Contents