When users obtain a BNC product, they agree to the licence which gives them the right to hold and use a copy of the corpus. A corpus is a dataset which can be used in many different ways, and we regret that the University of Oxford is not able to offer support to users of the corpus. Funding for the development and support of the corpus ended many years ago, but the corpus has been created in such a way that it should be usable long into the future, with software created by the community.
The British National Corpus (BNC) Consortium was formed in 1990, and started work in 1991 on the three-year task of producing a hundred-million word corpus of modern British English for use in commercial and academic research. The full BNC contains about 100 million words: 90% written, 10% orthographically transcribed spoken text. The first edition was completed in 1994 and the first general release of the corpus for European researchers was announced in February 1995. A slightly revised version, BNC World, was made available world-wide in 2001. In 2007, a third edition appeared, using XML. BNC XML Edition is the version currently distributed and supported. Two subsets of the BNC have been produced separately: BNC Sampler and BNC Baby.
The BNC corpora have historically been distributed with a free search tool called XAIRA, developed specially for the BNC, but also usable with other corpora in XML format. Many users will still access the BNC via a disk which includes a copy of XAIRA. XAIRA is not supported by the University of Oxford, and I'm afraid that staff there cannot answer queries about its installation or use. Users of XAIRA should bear in mind that it won't be usable in all circumstances and for all purposes, and that some time in the future, it is unlikely to work at all any more on the latest computing platforms.
The full BNC contains about 100 million words: 90% written, 10% orthographically transcribed spoken text. It is annotated with word-class information (part-of-speech, simplified word class) and lemmatized. The texts also contain detailed metatextual information. It is delivered in XML format.
BNC XML Edition is distributed on two DVDs. Users are welcome to try out the free XAIRA software, which is provided for installation on a stand-alone PC or on a Windows, Unix or OSX server. A copy of the XAIRA search program and the XAIRA index files for the BNC are provided.
Full reference information about the BNC is provided in the Reference Guide for the British National Corpus (XML Edition). Information about the BNC project and the original creation of the corpus can be found at corpus creation page. To buy a copy of the corpus, follow the links to the How to order page.
The BNC Baby is in XML format. A free copy of the XAIRA program (included on the CD). It is distributed on a CD together with the BNC Sampler and an XML version of the American English Brown corpus. More information about the CD is available on the BNC Baby CD page. The CD can be ordered online.
The BNC Sampler is a subset of the full BNC. It comprises two samples of written and spoken material of one million words each, compiled to mirror the composition of the full BNC as far as possible. The word-class annotation of the BNC Sampler texts has been carefully checked and manually corrected. The Sampler was first created at Lancaster University during the creation of the BNC. More information about the Sampler can be found in the users reference guide for the BNC Sampler: XML Edition [.pdf file]
The Brown Corpus of Standard American English was created at the Brown university by by W. N. Francis and H. Kucera. It contains one million words of written American English, taken from publications from 1961. The texts are all about 2,000 words long and grouped into 15 categories. More information about the content of the corpus can be found in the Brown Corpus Manual by Francis and Kucera, available on the ICAME webpage.
This version of the Brown corpus has word-class annotation and has been converted into XML and indexed to be used with the XAIRA program (included on the CD). It is distributed on the BNC Baby CD together with the BNC Baby and BNC Sampler corpora. How to order