Introduction
This manual contains a description of the design principles underlying the British National Corpus (BNC), and detailed information about the way in which it is encoded, in particular, a definition of the SGML document type declaration (DTD) used. A list giving brief bibliographic details for each text making up the corpus is also included.
This edition of the manual is a revised version of the document
released with version 1.0 of the corpus, as distributed in May
1995. It describes the BNC World Edition, released in October
2000. Further information about the BNC is also available from its
World Wide Web server at http://www.hcu.ox.ac.uk/BNC
The material presented in this manual derives from a number of BNC Project internal documents, with original contributions from all the participants in the project. Factual errors, chiefly relating to the composition of the corpus, have been corrected and the description of the encoding scheme has been modified in line with the changes introduced in this version. In other respects, this version of the documentation has been unchanged from the first release of the corpus. A brief list of the revisions made to the corpus encoding is given in section ??.
Acknowledgments
Creation of the corpus was funded by the UK Department of Trade and Industry and the Science and Engineering Research Council under grant number IED4/1/2184 (1991-1994), within the DTI/SERC Joint Framework for Information Technology. Additional funding was provided by the British Library and the British Academy.
- OUP
- Lyndsay Brown; Jeremy Clear (project manager 1991-2); Caroline Davis; Ginny Frewer; Frank Keenan; Tom McLean; Anita Sabin; Ray Woodall (project manager 1992-4)
- Longman
- Steve Crowdy (project manager); Denise Denney; Duncan Pettigrew
- Chambers Harrap
- Robert Allen; Ilona Morison
- OUCS
- Glynis Baguley; Gavin Burnage; Tony Dodd; Dominic Dunlop (project manager 1992-4)
- UCREL
- Tom Barney; Michael Bryant (project manager 1991-3); Elizabeth Eyes; Jean Forrest; Roger Garside; Mary Hodges; Mary Kinane; Nicholas Smith; Xungfeng Xu.
After the completion of the first edition of the BNC, a phase of tagging improvement was undertaken at Lancaster University with funding from the Engineering and Physical Sciences Research Council (Research Grant No. GR/F 99847). This tagging enhancement project was led by Geoffrey Leech, Roger Garside and Tony McEnery. The main objective was to correct as many tagging errors as possible, using an enhanced version of Claws4. In addition, a new tool was developed (the Template Tagger) for ‘patching’ the corpus in such a way as to eliminate further sets of errors by rule. This tool was developed by Michael Pacey, building on a prototype written by Steven Fligelstone. The research team working on tagging improvement was Nicholas Smith (lead researcher), Martin Wynne and Paul Baker.
Correction and validation of the bibliographic and contextual information in all the BNC Headers was carried out at OUCS by Lou Burnard, with assistance at various stages from Andrew Hardie and Paul Groves, who helped check demographic details for all spoken texts, and in particular from David Lee, who checked bibliographic and classification information for the bulk of the written texts. Thanks are also due to the many users of the original version of the BNC who took the time to notify us of errors they found.
Thanks are also due to Sebastian Rahtz for his help in the production of this manual.
Up: Contents