Text only | Skip links
Skip links||

Search Site

BNC Products

The British National Corpus (BNC) Consortium was formed in 1990, and started work in 1991 on the three-year task of producing a hundred-million word corpus of modern British English for use in commercial and academic research. The full BNC contains about 100 million words: 90% written, 10% orthographically transcribed spoken text. The first edition was published in 1994. A slightly revised version, BNC World, was made available world-wide in 2001. In 2007, the BNC was made available in XML. BNC XML Edition is the version currently distributed and supported. Two subsets of the BNC have been produced separately: BNC Sampler and BNC Baby.

The BNC corpora are distributed with a search tool. Xaira is a development of the SARA program originally developed for use with the first versions of the BNC and BNC Sampler. XAIRA can be used with the BNC Baby, BNC Sampler (XML-version) and BNC XML Edition as well as with other corpora in XML format.

BNC XML Edition

The BNC contains about 100 million words: 90% written, 10% orthographically transcribed spoken text. It has been annotated with word-class information (part-of-speech) and the texts also contain metatextual information.BNC XML Edition is a revised version of the BNC World and it was released in 2007. BNC XML Edition has some additional information about lemmas and simplified word-class of the individual words, but apart from a few errors and inconsistencies, no changes have been made to the actual corpus texts between the two versions. This version of the corpus is in XML format and can be used with the XAIRA search program which allows more search options and an improved user interface than the previous SARA program.

BNC XML Edition is made available on DVD for installation on a stand-alone PC or on a Windows, Unix or OSX server. It is delivered with a copy of the XAIRA search program and all necessary XAIRA index files.

For more information about the BNC XML Edition corpus, follow the links to the Reference Guide for the British National Corpus (XML Edition). Information about the BNC project and the original creation of the corpus can be found at corpus creation page. To buy a copy of the corpus, follow the links to the How to order page.

BNC Baby

BNC Baby is a subset of the BNC World. It consists of four one-million word samples, each compiled as an example of a particular genre: fiction, newspapers, academic writing and spoken conversation. The texts have the same annotation as the full corpus (part of speech, meta data, etc). The Reference Guide to BNC Baby [.pdf file] offers further information about this sample, such as a description of the design and information about the way in which it is encoded.

The BNC Baby is in XML format and can be searched with the XAIRA program (included on the CD). It is distributed on a CD together with the BNC Sampler and an XML version of the American English Brown corpus. More information about the CD is available on the BNC Baby CD page. The CD can be ordered online.

BNC Sampler

The BNC Sampler is a subset of the full BNC. It comprises two samples of written and spoken material of one million words each, compiled to mirror the composition of the full BNC as far as possible. The word-class annotation of the BNC Sampler texts has been carefully checked and manually corrected. The Sampler was first created at Lancaster University during the creation of the BNC. More information about the Sampler can be found in the users reference guide for the BNC Sampler: XML Edition [.pdf file]

The BNC Sampler is in XML format and can be searched with the XAIRA program (included on the CD). It is distributed on the BNC Baby CD together with the BNC Baby and an XML version of the American English Brown corpus. How to order

Brown Corpus

The Brown Corpus of Standard American English was created at the Brown university by by W. N. Francis and H. Kucera. It contains one million words of written American English, taken from publications from 1961. The texts are all appr, 2,000 words long and grouped into 15 categories. More information about the content of the corpus can be found in the Brown Corpus Manual by Francis and Kucera, available on the ICAME webpage.

This version of the Brown corpus has word-class annotation and has been converted into XML and indexed to be used with theXAIRA program (included on the CD). It is distributed on the BNC Baby CD together with the BNC Baby and BNC Sampler corpora. How to order

BNC World

The BNC contains about 100 million words: 90% written, 10% orthographically transcribed spoken text. BNC World is a revised version of the original BNC and was produced between 1998 and 2000. It contains a thorough revision of the part of speech tagging, several corrections to the headers, and some minor revision of the SGML tagging used. BNC World was made available world-wide in 2001. It has now been superseded by BNC XML Edition.

BNC World was made available on CD for installation on a stand-alone PC or on a Windows, Unix or OSX server. The corpus can also be accessed via the BNC Subscription service or by using the BNC Simple Search.

For more information about the BNC World corpus, follow the links to the Users Reference Guide.

Up: Contents


Style: Single file | Normal | PDF

Maintained by: BNC Webmaster (bnc-queries@rt.oucs.ox.ac.uk) January 2009. Lou Burnard.
© 2005, University of Oxford.